Hybrid Quantum-Classical Computer for Bayesian Inference with Engineered Likelihood Functions for Robust Amplitude Estimation

ABSTRACT

A hybrid quantum-classical (HQC) computer takes advantage of the available quantum coherence to maximally enhance the power of sampling on noisy quantum devices, reducing measurement number and runtime compared to VQE. The HQC computer derives inspiration from quantum metrology, phase estimation, and the more recent “alpha-VQE” proposal, arriving at a general formulation that is robust to error and does not require ancilla qubits. The HQC computer uses the “engineered likelihood function” (ELF)to carry out Bayesian inference. The ELF formalism enhances the quantum advantage in sampling as the physical hardware transitions from the regime of noisy intermediate-scale quantum computers into that of quantum error corrected ones. This technique speeds up a central component of many quantum algorithms, with applications including chemistry, materials, finance, and beyond.

BACKGROUND

Quantum computers promise to solve industry-critical problems which are otherwise unsolvable or only very inefficiently addressable using classical computers. Key application areas include chemistry and materials, bioscience and bioinformatics, logistics, and finance. Interest in quantum computing has recently surged, in part due to a wave of advances in the performance of ready-to-use quantum computers. However, near-term quantum devices are still extremely limited in resources, preventing the deployment of quantum computers on problems of practical interest.

A recent flurry of methods which cater to the limitations of near-term quantum devices have drawn significant attention. These methods include the variational quantum eigensolver (VQE), quantum approximate optimization algorithm (QAOA) and variants, variational quantum linear systems solver, other quantum algorithms leveraging the variational principles, and quantum machine learning algorithms. In spite of such algorithmic innovations, many of these approaches have appeared to be impractical for commercially-relevant problems owing to their high cost in terms of number of measurements and runtime. Yet, methods offering a quadratic speedup in runtime, such as phase estimation, demand quantum resources that are far beyond the reach of near-term devices for moderately large problem instances.

SUMMARY

The number of measurements demanded by hybrid quantum-classical algorithms such as the variational quantum eigensolver (VQE) is prohibitively high for many problems of practical value. Quantum algorithms which reduce this cost (e.g. quantum amplitude and phase estimation) require error rates that are too low for near-term implementation. Embodiments of the present invention include hybrid quantum-classical (HQC) computers, and methods performed by HQC computers, which take advantage of the available quantum coherence to maximally enhance the power of sampling on noisy quantum devices, reducing measurement number and runtime compared to VQE. Such embodiments derive inspiration from quantum metrology, phase estimation, and the more recent “alpha-VQE” proposal, arriving at a general formulation that is robust to error and does not require ancilla qubits. The central object of this method is what we call the “engineered likelihood function” (ELF), used for carrying out Bayesian inference. Embodiments of the present invention use the ELF formalism to enhance the quantum advantage in sampling as the physical hardware transitions from the regime of noisy intermediate-scale quantum computers into that of quantum error corrected ones. This technique speeds up a central component of many quantum algorithms, with applications including chemistry, materials, finance, and beyond.

Other features and advantages of various aspects and embodiments of the present invention will become apparent from the following description and from the claims.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a diagram of a quantum computer according to one embodiment of the present invention;

FIG. 2A is a flowchart of a method performed by the quantum computer of FIG. 1 according to one embodiment of the present invention;

FIG. 2B is a diagram of a hybrid quantum-classical computer which performs quantum annealing according to one embodiment of the present invention;

FIG. 3 is a diagram of a hybrid quantum-classical computer according to one embodiment of the present invention;

FIG. 4 is a diagram of a hybrid quantum-classical (HQC) computer for performing quantum amplitude estimation according to one embodiment of the present invention;

FIGS. 5A-5C illustrate quantum circuits of standard sampling and of some embodiments of the present invention, along with their corresponding likelihood functions;

FIGS. 6A-6B illustrate plots showing the dependence of the Fisher information on various likelihood functions;

FIG. 7 illustrates operations used for generating samples that correspond to an engineered likelihood function according to one embodiment of the present invention;

FIG. 8 illustrates an algorithm implemented by embodiments of the present invention;

FIGS. 9A-9B, 10, 11A-11B, and 12 illustrate various algorithms performed by embodiments of the present invention;

FIG. 13 is a plot of true and fitted likelihood functions according to various embodiments of the present invention;

FIGS. 14A-14B, 15A-15B, 16A-16B, 17A-17B, and 18A-18B show plots illustrating performance of various embodiments of the present invention;

FIG. 19 shows the R ₀ factors of various embodiments of the present invention;

FIGS. 20A-20B and 21A-21B show plots illustrating performance of various embodiments of the present invention;

FIGS. 22A-22B show the R ₀ factors of various embodiments of the present invention;

FIGS. 23A-23B and 24A-24B show plots illustrating performance of various embodiments of the present invention;

FIGS. 25A-25B show the R ₀ factors of various embodiments of the present invention;

FIG. 26 shows plots illustrating the runtime to target accuracy of various embodiments of the present invention;

FIGS. 27-28 show quantum circuits implemented according to embodiments of the present invention;

FIGS. 29A-29B, 30A-30B, 31A-31B and 32 show algorithms implemented according to embodiments of the present invention; and

FIG. 33 shows true and fitted likelihood functions according to embodiments of the present invention.

DETAILED DESCRIPTION

Embodiments of the present invention are directed to a hybrid quantum-classical (HQC) computer which performs quantum amplitude estimation. Referring to FIG. 4 , a flow diagram is shown of a HQC 430, including both a quantum computer 432 and a classical computer 434, which performs a method of quantum amplitude estimation according to one embodiment of the present invention. In the block 404, which is performed with the classical computer 434, a plurality of quantum-circuit-parameter values is selected to optimize an accuracy-improvement rate of a statistic estimating an expectation value (s|P|s〉 of an observable P 400 with respect to a quantum state |s〉 402.

In embodiments, the statistic is a sample mean calculated from a plurality of values sampled from a random variable. In the present discussion, these sampled values are obtained by measuring qubits of the quantum computer 432. However, the statistic may alternatively be a skewness, kurtosis, quantile, or another type of statistic without departing from the scope hereof. The statistic is an estimator of the expectation value 〈s|P|s〉, and may be either biases or unbiased. The plurality of values may be modeled according to a probability distribution, in which case the statistic may represent a parameter of the probability distribution. For example, the statistic may represent a mean of a Gaussian distribution, as described in more detail below in Section 3.2.

The quantum-circuit parameter values are real numbers that control how quantum gates operate on qubits. In the present discussion, each quantum circuit can be represented as a sequence of quantum gates in which each quantum gate of the sequence is controlled by one of the quantum-circuit parameter values. For example, each of the quantum-circuit parameters value may represent an angle by which the state of one or more qubits is rotated in a corresponding Hilbert space.

The accuracy-improvement rate is a function that expresses by how much a corresponding accuracy of the statistic improves with each iteration of the present method embodiments. The accuracy-improvement rate is a function of the quantum-circuit parameters, and may additionally be a function of the statistic (e.g., the mean). The accuracy is any quantitative measure of an error of the statistic. For example, the accuracy may be a mean squared error, standard deviation, variance, mean absolute error, or another moment of the error. Alternatively, the accuracy may be an information metric, such as Fisher information or information entropy. Examples that use variance for the accuracy are described in more detail below (e.g., see Eqn. 36). In these examples, the accuracy-information rate may be the variance reduction factor introduced in Eqn. 38. Alternatively, the accuracy-information rate may be Fisher information (e.g., see Eqn. 42). However, the accuracy-improvement rate may be another function quantifying improvements in the accuracy without departing from the scope hereof.

In some embodiments, the plurality of quantum-circuit-parameter values is selected using one of coordinate ascent and gradient descent. Both of these techniques are described in more detail in Section 4.1.1.

In the block 406, which is performed with the quantum computer 432, a sequence of alternating first and second generalized reflection operators is applied to one or more qubits of the quantum computer 432 to transform the one or more qubits from the quantum state |s〉 into a reflected quantum state. Each of the first and second generalized reflection operators are controlled according to a corresponding one of the plurality of quantum-circuit-parameter values. The operators U(x) and V(y) described in Section 3.1 are examples of the first and second generalized reflection operators, respectively. The operator Q(x) introduced in Eqn. 19 is one example of a sequence of alternative first and second generalized reflection operators, where the vector x represents the plurality of quantum-circuit-parameter values. The sequence of first and second generalized reflection operators and the observable P may define a bias of an engineered likelihood function, as described below with respect to Eqn. 26.

In the block 408, which is also performed with the quantum computer 432, the plurality of qubits in the reflected quantum state are measured with respect to the observable P to obtain a set of measurement outcomes. In the block 410, which is performed on the classical computer 434, the statistic is updated with the set of measurement outcomes to obtain an estimate of 〈s|P|s〉 with higher accuracy.

The method may further include outputting the statistic after said updating. Alternatively, the method may iterate over the blocks 404, 406, 408, and 410. In some embodiments, the method further includes updating, on the classical computer 434 and with the set of measurement outcomes, an accuracy estimate of the statistic. The accuracy estimate is the calculated value of the accuracy described above (e.g., variance). In these embodiments, the method iterates over the blocks 404, 406, 408, and 410 until the accuracy estimate falls below a threshold.

In some embodiments, the statistic is updated in the block 410 by updating a prior distribution with the plurality of measurements to obtain a posterior distribution, and calculating an updated statistic from the posterior distribution.

In some embodiments, the plurality of quantum-circuit-parameter values is selected based on the statistic and an accuracy estimate of the statistic. The plurality of quantum-circuit-parameter values may further be selected based on a fidelity representing errors occurring during said applying and measuring.

1.1. Introduction

The combination of phase estimation and the Bayesian perspective gives rise to Bayesian phase estimation techniques that are more suitable for noisy quantum devices capable of realizing limited depth quantum circuits than the early proposals. Adopting the notation from above, the circuit parameters θ = (m,β) and the goal is to estimate the phase Π in an eigenvalue e^(i) ^(arccos) ^(Π) of the operator U. An important note is that the likelihood function here,

$\begin{matrix} {p\left( {d\left| {m,\beta,\text{Π}} \right)} \right) = \frac{1}{2}\left( {1 + \left( {- 1} \right)^{d}\left\lbrack {\cos(\beta)T_{m}\left( \text{Π} \right) + \sin(\beta)U_{m}\left( \text{Π} \right)} \right\rbrack} \right),} & \text{­­­(1)} \end{matrix}$

where T_(m) and U_(m) are Chebyshev polynomials of first and second kind respectively, is shared in many settings beyond Bayesian phase estimation. This commonality makes the Bayesian inference machinery used for tasks such as Hamiltonian characterization relevant to phase estimation. In the exponential advantage of Bayesian inference with a Gaussian prior over other non-adaptive sampling methods is established by showing that the expected posterior variance σ decays exponentially in the number of inference steps. Such exponential convergence is at a cost of 0(⅟σ) amount of quantum coherence required at each inference step. Such scaling is also confirmed in the context of Bayesian phase estimation.

Equipped with the techniques of Bayesian phase estimation as well as the perspective of overlap estimation as an amplitude estimation problem, one may devise a Bayesian inference method for operator measurement that smoothly interpolates between the standard sampling regime and phase estimation regime. This is proposed as “α-VQE”, where the asymptotic scaling for performing an operator measurement is 0(1/^(α)) with the extremal values of a = 2 corresponding to the standard sampling regime (typically realized in VQE) and α = 1 corresponding to the quantum-enhanced regime where the scaling reaches the Heisenberg limit (typically realized with phase estimation). By varying the parameters for the Bayesian inference one can also achieve α values between 1 and 2. The lower the α value, the deeper the quantum circuit needed for Bayesian phase estimation. This accomplishes the trade-off between quantum coherence and asymptotic speedup for the measurement process.

It is also worth noting that phase estimation is not the only paradigm that can reach the Heisenberg limit for amplitude estimation. In previous works, the authors consider the task of estimating the parameter θ of a quantum state ρ_(θ). A parallel strategy is proposed where m copies of the parametrized circuit for generating ρ_(θ), together with an entangled initial state and measurements in an entangled basis, are used to create states with the parameter θ amplified to mθ. Such amplification can also give rise to likelihood functions that are similar to that in Equation 1. In previous work it is shown that with randomized quantum operations and Bayesian inference one can extract information in fewer iteration than classical sampling even in the presence of noise. In quantum amplitude estimation circuits with varying numbers m of iterations and numbers N of measurements are considered. A particularly chosen set of pairs (m, N) gives rise to a likelihood function that can be used for inferring the amplitude to be estimated. Heisenberg limit is demonstrated for one particular likelihood function construction given by the authors. Both works highlight the power of parametrized likelihood functions, making it tempting to investigate their performance under imperfect hardware conditions.

1.2. Main Results

Embodiments of the present invention include systems and methods for estimating the expectation Π = 〈A|O|A〉 where the state |A〉 can be prepared by a quantum circuit A such that |A〉 = A|0〉. Embodiments of the present invention may use a family of quantum circuits, such that, as the circuit deepens with more repetitions of A, it allows for likelihood functions that are polynomial in Π of ever higher degree. As described in the next section with a concrete example, a direct consequence of this increase in polynomial degree is an increase in the power of inference, which can be quantified by Fisher information gain at each inference step. After establishing this “enhanced sampling” technique, embodiments of the present invention may introduce parameters into the quantum circuit and render the resulting likelihood function tunable. Embodiments of the present invention may optimize the parameters for maximal information gain during each step of inference. The following lines of insight emerge from our efforts:

1. The role of noise and error in amplitude estimation: Previous works have revealed the impact of noise on the likelihood function and the output estimation of the Hamiltonian spectrum. The disclosure herein investigates the same for schemes of amplitude estimation used by embodiments of the present invention. The description herein demonstrates that while noise and error does increase the runtime needed for producing an output that is within a specific statistical error tolerance, they do not necessarily introduce systematic bias in the output of the estimation algorithm. Systematic bias in the estimate can be suppressed by using active noise-tailoring techniques and calibrating the effect of noise.

Simulation using realistic error parameters for near-term devices has revealed that the enhanced sampling scheme can outperform VQE in terms of sampling efficiency. Experimental results have also revealed a perspective on tolerating error in quantum algorithm implementation where higher fidelity does not necessarily lead to better algorithmic performance. In certain embodiments of the present invention, there is an optimal circuit fidelity of roughly 0.6 at which the enhanced scheme yields the maximum amount of quantum speedup.

-   1. The role of likelihood function tunability: Parametrized     likelihood functions may be used in phase estimation or amplitude     estimation routines. To our knowledge, all of the current methods     focus on likelihood functions of the Chebyshev form (Equation 1).     For these Chebyshev likelihood functions (CLF), in the presence of     noise there are specific values of the parameter Π (the “dead     spots”) for which the CLFs are significantly less effective for     inference than other values of Π. Embodiments of the present     invention may remove those dead spots by engineering the form of the     likelihood function with generalized reflection operators whose     angle parameters are made tunable. -   2. Runtime model for estimation as error rates decrease: Previous     works have demonstrated smooth transitions in the asymptotic cost     scaling from the 0(1/²) of VQE to 0(1/ε) of phase estimation.     Embodiments of the present invention advance this line of thinking     by developing a model for estimating the runtime t_(ε) to target     accuracy ε using devices with degree of noise λ (c.f. Section 6): -   $\begin{matrix}     {\left. t_{\varepsilon} \right.\sim O\left( {\frac{\lambda}{\varepsilon^{2}} + \frac{1}{\sqrt{2}\varepsilon} + \sqrt{\left( \frac{\lambda}{\varepsilon^{2}} \right)^{2} + \left( \frac{2\sqrt{2}}{\varepsilon} \right)^{2}}} \right).} & \text{­­­(2)}     \end{matrix}$ -   3. The model interpolates between the 0(1/ε) scaling and 0(1/ε²)     scaling as a function of λ. Such bounds also allow embodiments of     the present invention to be analyzed for quantum speedup as a     function of hardware specifications such as the number of qubits and     two-qubit fidelity, and therefore estimate runtimes using realistic     parameters for current and future hardware.

Subsequent sections of the present disclosed are organized as follows. Section 2 presents a concrete example of a scheme implemented according to one embodiment of the present invention. Subsequent sections then expand on the general formulation of this scheme according to various embodiments of the present invention. Section 3 describes in detail the general quantum circuit construction for realizing ELFs, and analyzes the structure of ELF in both noisy and noiseless settings. In addition to the quantum circuit scheme, embodiments of the present invention also involve 1) tuning the circuit parameter to maximize information gain, and 2) Bayesian inference for updating the current belief about the distribution of the true value of Π. Section 4 presents heuristic algorithms for both. Numerical results are presented in Section 5, comparing embodiments of the present invention with existing methods based on CLFs. Section 6 discloses a runtime model and derives the expression in (2). Section 7 discloses implications of the disclosed results from a broad perspective of quantum computing.

TABLE 1 Comparison of our scheme with relevant proposals that appear in the literature. Scheme Bayesian inference Noise consideration Fully tunable LFs Requires ancilla Requires eigenstate Knill et al. No No No Yes No Svore et al. No No No Yes Yes Wiebe and Grenade Yes Yes No Yes Yes Wang et al. Yes Yes No Yes Yes O’Brien et al. Yes Yes No Yes No Zintchenko and Wiebe No Yes No No No Suzuki et al. No No No No No This work (Section 1) Yes Yes Yes No No This work (Appendix A) Yes Yes Yes Yes No

Here the list of features include whether the quantum circuit used in the scheme requires ancilla qubits in addition to qubits holding the state for overlap estimation, whether the scheme uses Bayesian inference, whether any noise resilience is considered, whether the initial state is required to be an eigenstate, and whether the likelihood function is fully tunable like ELF proposed here or restricted to Chebyshev likelihood functions.

2. A First Example

There are two main strategies for estimating the expectation value 〈A|P|A〉 of some operator P with respect to a quantum state |A〉. The method of quantum amplitude estimation provides a provable quantum speedup with respect to certain computational models. However, to achieve precision ε in the estimate, the circuit depth needed in this method scales as 0(⅟ε), making it impractical for near term quantum computers. The variational quantum eigensolver uses standard sampling to carry out amplitude estimation. Standard sampling allows for low-depth quantum circuits, making it more amenable to implementation on near term quantum computers. However, in practice, the inefficiency of this method makes VQE impractical for many problems of interest. This section introduces a method of enhanced sampling for amplitude estimation, which may be used by embodiments of the present invention. This technique seeks to maximize the statistical power of noisy quantum devices. This method is described as starting from a simple analysis of standard sampling as used in VQE.

The energy estimation subroutine of VQE estimates amplitudes with respect to Pauli strings. For a Hamiltonian decomposed into a linear combination of Pauli strings H = ∑_(j) µ_(j) P_(j) and “ansatz state” |A〉, the energy expectation value is estimated as a linear combination of Pauli expectation value estimates

$\begin{matrix} {\hat{E} = {\sum\limits_{j}{\mu_{j}{\hat{\text{Π}}}_{j}}},} & \text{­­­(3)} \end{matrix}$

where Π _(j) is the (amplitude) estimate of 〈A|P_(j)|A〉. VQE uses the standard sampling method to build up Pauli expectation value estimates with respect to the ansatz state, which can be summarized as the following:

Standard sampling:

-   1. Prepare |A〉 and measure operator P receiving outcome d = 0.1. -   2. Repeat M times, receiving k outcomes labeled 0 and M — k outcomes     labeled 1. -   3. Estimate Π = 〈A|P|A〉 as -   $\hat{\text{Π}} = \frac{k - \left( {M - k} \right)}{M}.$

M

The performance of this estimation strategy may be quantified using the mean squared error of the estimator as a function of time t = TM, where T is the time cost of each measurement. Because the estimator is unbiased, the mean squared error is simply the variance in the estimator,

$\begin{matrix} {MSE\left( \hat{\text{Π}} \right) = \frac{1 - \Pi^{2}}{M}.} & \text{­­­(4)} \end{matrix}$

For a specific mean squared error MSE(Π) = ε², the runtime of the algorithm needed to ensure mean squared error ε² is

$\begin{matrix} {t_{\varepsilon} = T\frac{1 - \Pi^{2}}{\varepsilon^{2}}.} & \text{­­­(5)} \end{matrix}$

The total runtime of energy estimation in VQE is the sum of the runtimes of the individual Pauli expectation value estimation runtimes. For problems of interest, this runtime may be far too costly, even when certain parallelization techniques are used. The source of this cost is the insensitivity of the standard sampling estimation process to small deviations in Π: the expected information gain about Π contained in the standard-sampling measurement outcome data is low.

Generally, we can measure the information gain of an estimation process of M repetitions of standard sampling with the Fisher information

$\begin{matrix} \begin{matrix} {I_{M}\left( \text{Π} \right) = \mathbb{E}_{D}\left\lbrack \left( {\frac{\partial}{\partial\Pi}\log{\mathbb{P}}\left( {D\left| \text{Π} \right)} \right)} \right)^{2} \right\rbrack = - \mathbb{E}_{D}\left\lbrack {\frac{\partial^{2}}{\partial\Pi^{2}}\log{\mathbb{P}}\left( {D\left| \text{Π} \right)} \right)} \right\rbrack} \\ {= {\sum\limits_{D}\frac{1}{{\mathbb{P}}\left( {D|\Pi)} \right)}}\left( {\frac{\partial}{\partial\Pi}{\mathbb{P}}\left( {D\left| \text{Π} \right)} \right)} \right)^{2},} \end{matrix} & \text{­­­(6)} \end{matrix}$

where D = {d₁, d₂, ⋯, d_(M)} is the set of outcomes from M repetitions of the standard sampling. The Fisher information identifies the likelihood function ℙ(D|Π) as being responsible for information gain. A lower bound the mean squared error of an (unbiased) estimator may be obtained with the Cramer-Rao bound

$\begin{matrix} {\text{MSE}\left( \hat{\text{Π}} \right) \geq \frac{1}{I_{M}(\Pi)}.} & \text{­­­(7)} \end{matrix}$

Using the fact that the Fisher information is additive in the number of samples, we have I_(M)(Π) = MI₁(Π) where I₁(Π) = 1/(1 - Π²) is the Fisher information of a single sample drawn from likelihood function ℙ(d|Π) = (1 + (-1)^(d)Π)/2. Using the Cramer-Rao bound, a lower bound may be found for the runtime of the estimation process as

$\begin{matrix} {t_{\varepsilon} \geq \frac{T}{I_{1}(\Pi)\varepsilon^{2}},} & \text{­­­(8)} \end{matrix}$

which shows that in order to reduce the runtime of an estimation algorithm, embodiments of the present invention may increase the Fisher information.

One purpose of enhanced sampling is to reduce the runtime of overlap estimation by engineering likelihood functions which increase the rate of information gain. Consider the simplest case of enhanced sampling, which is illustrated in FIGS. 5A-5C. To generate data, embodiments of the present invention may prepare the ansatz state |A〉, apply the operation P, apply a phase flip about the ansatz state, and then measure P. The phase flip about the ansatz state may be achieved by applying the inverse of the ansatz circuit A⁻¹, applying a phase flip about the initial state R₀ = 2|0^(N)〉〈0^(N)| - I, and then re-applying the ansatz circuit A. In this case, the likelihood function becomes

$\begin{matrix} \begin{array}{r} {{\mathbb{P}}\left( {d\left| \text{Π} \right)} \right) = \frac{1 + \left( {- 1} \right)^{d}cos\left( {3\mspace{6mu} arccos(\Pi)} \right)}{2}} \\ {= \frac{1 + \left( {- 1} \right)^{d}\left( {4\Pi^{3} - 3\Pi} \right)}{2}.} \end{array} & \text{­­­(9)} \end{matrix}$

The bias is a degree-3 Chebyshev polynomial in Π. The disclosure herein will refer to such likelihood functions as Chebyshev likelihood functions (CLFs).

In order to compare the Chebyshev likelihood function of enhanced sampling to that of standard sampling, consider the case of Π = 0. Here, ℙ(0|Π = 0) = ℙ(1|Π = 0) and so the Fisher information is proportional to the square of the slope of the likelihood function

$\begin{matrix} {I_{1}\left( {\text{Π} = 0} \right) = \left( \frac{\partial{\mathbb{P}}\left( {d = 0|\Pi)} \right)}{\partial\Pi} \right)^{2}.} & \text{­­­(10)} \end{matrix}$

As seen in FIG. 5B, the slope of the Chebyshev likelihood function at Π = 0 is steeper than that of the standard sampling likelihood function. The single sample Fisher information in each case evaluates to

$\begin{matrix} \begin{array}{l} {Standard:I_{1}\left( {\text{Π} = 0} \right) = 1} \\ {Enhanced:I_{1}\left( {\text{Π} = 0} \right) = 9,} \end{array} & \text{­­­(11)} \end{matrix}$

demonstrating how a simple variant of the quantum circuit may enhance information gain. In this example, using the simplest case of enhanced sampling may reduce the number of measurements needed to achieve a target error by at least a factor of nine. As will be discussed later, embodiments of the present invention may further increase the Fisher information by applying L layers of PA^(†)R₀A before measuring P. In fact, the Fisher information

$I_{1}\left( \text{Π} \right) = \frac{\left( {2L + 1} \right)^{2}}{1 - \Pi^{2}} = O\left( L^{2} \right)$

grows quadratically in L.

Not yet described is an estimation scheme that converts enhanced sampling measurement data into an estimation. One intricacy that enhanced sampling introduces is the option to vary L as embodiments of the present invention are collecting measurement data. In this case, given a set of measurement outcomes from circuits with varying L, the sample mean of the 0 and 1 counts loses its meaning. Instead of using the sample mean, to process the measurement outcomes into information about Π embodiments of the present invention may use Bayesian inference. Section 2 describes certain embodiments which use Bayesian inference for estimation.

At this point, one may be tempted to point out that the comparison between standard sampling and enhanced sampling is unfair because only one query to A is used in the standard sampling case while the enhanced sampling scheme uses three queries of A. It seems that if one considers a likelihood function that arises from three standard sampling steps, one could also yield a cubic polynomial form in the likelihood function. Indeed, suppose one performs three independent standard sampling steps yielding results x₁, x₂, x₃ ∈ {0,1}, and produces a binary outcome z ∈ {0,1} classically by sampling from a distribution ℙ(z|x₁, x₂, x₃). Then the likelihood function takes the form of

$\begin{matrix} \begin{array}{r} {{\mathbb{P}}\left( {z\left| \text{Π} \right)} \right) = {\sum\limits_{x_{1},x_{2},x_{3}}{{\mathbb{P}}\left( {z\left| {x_{1},x_{2},x_{3}} \right)} \right){\mathbb{P}}\left( {x_{1},x_{2},x_{3}\left| \text{Π} \right)} \right)}}} \\ {= {\sum\limits_{i = 0}^{3}{\alpha_{i}\left( \begin{array}{r} 3 \\ i \end{array} \right)\left( \frac{1 + \Pi}{2} \right)^{i}\left( \frac{1 - \Pi}{2} \right)^{3 - i}}},} \end{array} & \text{­­­(12)} \end{matrix}$

where each α_(i) ∈ [0,1] is a parameter that can be tuned classically through changing the distribution ℙ(z|x₁, x₂, x₃). More specifically,

α_(i) = ∑_(x₁x₂x₃ : h(x₁x₂x₃) = i)ℙ(z|x₁, x₂, x₃))

where h(x₁x₂x₃) is the Hamming weight of the bit string x₁x₂x₃. Suppose it is desired that ℙ(z = 0|Π) to be equal to ℙ(d = 0|Π) in Equation 9. This implies that α₀ = 1, α₁ = -2, α₂ = 3 and α₃ = 0, which is clearly beyond the classical tunability of the likelihood function in Equation 12. This is an evidence which suggests that the likelihood function arising from the quantum scheme in Equation 9 is beyond classical means.

As the number of circuit layers L is increased, the time per sample T grows linearly in L. This linear growth in circuit layer number, along with the quadratic growth in Fisher information leads to a lower bound on the expected runtime,

$\begin{matrix} {t_{\varepsilon} \in \text{Ω}\left( \frac{1}{L\varepsilon^{2}} \right),} & \text{­­­(13)} \end{matrix}$

assuming a fixed-L estimation strategy with an unbiased estimator. In practice, the operations implemented on the quantum computer are subject to error. Fortunately, embodiments of the present invention may use Bayesian inference, which can incorporate such errors into the estimation process. As long as the influence of errors on the form of the likelihood function is accurately modeled, the principal effect of such errors is only to slow the rate of information gain. Error in the quantum circuit accumulates as the number of circuit layers L is increased. Consequently, beyond a certain number of circuit layers, diminishing returns will be received with respect to gains in Fisher information (or the reduction in runtime). The estimation algorithm may then seek to balance these competing factors in order to optimize the overall performance.

The introduction of error poses another issue for estimation. Without error, the Fisher information gain per sample in the enhanced sampling case with L = 1 is greater than or equal to 9 for all Π. As shown in FIGS. 6A-6B, with the introduction of even a small degree of error, the values of Π where the likelihood function is flat incur a dramatic drop in Fisher information. Such regions are referred to herein as estimation dead spots. This observation motivates the concept of engineering likelihood functions (ELF) to increase their statistical power. By promoting the P and R₀ operations to generalized reflections U(x₁) = exp (-ix₁P) and R₀(y₂) = exp (- ix₂R₀) embodiments of the present invention may use rotation angles such that the information gain is boosted around such dead spots. Even for deeper enhanced sampling circuits, engineering likelihood functions allows embodiments of the present invention to mitigate the effect of estimation dead spots.

3. Engineered Likelihood Functions

This section describes a methodology of engineering likelihood functions for amplitude estimation which may be used by embodiments of the present invention. First, quantum circuits for drawing samples that correspond to engineered likelihood functions are described, and then techniques for tuning the circuit parameters and carry out Bayesian inference with the resultant likelihood functions are described.

3.1. Quantum Circuits for Engineered Likelihood Functions

Techniques will now be described for designing, implementing, and executing a procedure on a computer (e.g., a quantum computer or a hybrid quantum-classical computer) for estimating the expectation value

$\begin{matrix} {\text{Π} = \cos(\theta) = \left\langle {A|P|A} \right\rangle,} & \text{­­­(18)} \end{matrix}$

where |A〉 = A|0^(n)〉 in which A is an n-qubit unitary operator, P is an n-qubit Hermitian operator with eigenvalues ±1, and θ = arccos (Π) is introduced to facilitate Bayesian inference later on. In constructing the estimation algorithms disclosed herein, it may be assumed that embodiments of the present invention are able to perform the following primitive operations. First, embodiments of the present invention may prepare the computational basis state |0^(n)〉 and apply a quantum circuit A to it, obtaining |A〉 = A|0〉^(⊗n). Second, embodiments of the present invention implement the unitary operator U(x) = exp (— ixP) for any angle x ∈ ℝ. Finally, embodiments of the present invention perform the measurement of P, which is modeled as a projection-valued measure

$\left\{ {\frac{I + P}{2},\frac{I - P}{2}} \right\}$

with respective outcome labels {0,1}. Embodiments of the present invention may also make use of the unitary operator V(y) = AR₀(y)A^(†), where R₀(y) = exp (- iy(2|0^(n)〉〈0^(n)| - I)) and y ∈ ℝ. Following the convention, U(x) and V(y) will be referred to herein as the generalized reflections about the +1 eigenspace of P and the state |A〉, respectively, where x and y are the angles of these generalized reflections, respectively.

Embodiments of the present invention may use the ancilla-free¹ quantum circuit in FIG. 7 to generate the engineered likelihood function (ELF), which is the probability distribution of the outcome d ∈ {0,1} given the unknown quantity θ to be estimated. The circuit may, for example, include a sequence of generalize reflections. Specifically, after preparing the ansatz state |A〉 = A|0〉^(⊗n), embodiments of the present invention may apply 2L generalized reflections U(x₁), V(x₂), ..., U(x_(2L-1)), V(x_(2L)) to it, varying the rotation angle x_(j) in each operation. For convenience, V(x_(2j))U(x_(2j-1)) will be referred to herein as the j-th layer of the circuit, for j = 1,2, ...,L. The output state of this circuit is

$\begin{matrix} {Q\left( \overset{\rightarrow}{x} \right)\left( |A) \right\rangle = V\left( x_{2L} \right)U\left( x_{2L - 1} \right)\ldots V\left( x_{2} \right)U\left( x_{1} \right)\left( |A) \right\rangle,} & \text{­­­(19)} \end{matrix}$

where x = (x₁, x₂, ..., x_(2L-1), x_(2L)) ∈ ℝ^(2L) is the vector of tunable parameters. Finally, embodiments of the present invention may perform the projective measurement

$\left\{ {\frac{I + P}{2},\frac{I - P}{2}} \right\}$

on this state, receiving an outcome d ∈ {0,1}.

As in Grover’s algorithm, the generalized reflections U(x_(2j-1)) and V(x_(2j)) ensure that the quantum state remains in two-dimensional subspace S: = span{|A〉, P|A〉}² for any j. Let |A┴〉 be the state (unique, up to a phase) in S that is orthogonal to |A〉, i.e.

$\begin{matrix} {\left| A^{\bot} \right\rangle = \frac{P\left| A \right\rangle - \left\langle A \right|P\left| A \right\rangle\left| A \right\rangle}{\sqrt{1 - \left\langle {A|P|A} \right\rangle^{2}}}.} & \text{­­­(20)} \end{matrix}$

To help the analysis, we will view this two-dimensional subspace as a qubit, writing |A〉 and |A┴〉 as |0〉 and |1〉 respectively.

Let X, Y, Z and I be the Pauli operators and identity operator on this virtual qubit respectively. Then, focusing on the subspace S = span{|0〉, |1〉}, we can rewrite P as

¹ We call this scheme “ancilla-free” (AF) since it does not involve any ancilla qubits. In Appendix A, we consider a different scheme named the “ancilla-based” (AB) scheme which involves one ancilla qubit. ² To ensure that S is two-dimensional, assume that Π ≠ ±1, i.e. θ ≠ 0 or π.

$\begin{matrix} {P(\theta) = \cos(\theta)\overline{Z} + \sin(\theta)\overline{X},} & \text{­­­(21)} \end{matrix}$

and rewrite the generalized reflections U(x_(2j-1)) and V(x_(2j)) as

$\begin{matrix} {U\left( {\theta;x_{2j - 1}} \right) = \cos\left( X_{2j - 1} \right)\overline{I} - \text{i sin}\left( x_{2j - 1} \right)\left( {\cos(\theta)\overline{Z} + \sin(\theta)\overline{X}} \right)} & \text{­­­(22)} \end{matrix}$

and

$\begin{matrix} {V\left( x_{2j} \right) = \cos\left( x_{2j} \right)\overline{I} - \text{i sin}\left( x_{2j} \right)\overline{Z},} & \text{­­­(23)} \end{matrix}$

where x_(2j-1), x_(2j) ∈ ℝ are tunable parameters. Then the unitary operator Q(x) implemented by the L-layer circuit becomes

$\begin{matrix} {Q\left( {\theta;\overset{\rightarrow}{x}} \right) = V\left( x_{2L} \right)U\left( {\theta;x_{2L - 1}} \right)\ldots V\left( x_{2} \right)U\left( {\theta;x_{1}} \right).} & \text{­­­(24)} \end{matrix}$

Note that in this picture, |A〉 = |0〉 is fixed, while P = P(θ), U(x) = U(θ;x) and Q(x) = Q(θ;x) depend on the unknown quantity θ. It turns out to be more convenient to design and analyze the estimation algorithms in this “logical” picture than in the original “physical” picture. Therefore, this picture will be used for the remainder of this disclosure.

The engineered likelihood function (i.e. the probability distribution of measurement outcome d ∈ {0,1}) depends on the output state ρ(θ;x) of the circuit and the observable P(θ).

Precisely, it is

$\begin{matrix} {{\mathbb{P}}\left( {d\left| {\theta;\overset{\rightarrow}{x}} \right)} \right) = \frac{1 + \left( {- 1} \right)^{d}\Delta\left( {\theta;\overset{\rightarrow}{x}} \right)}{2},} & \text{­­­(25)} \end{matrix}$

where

$\begin{matrix} {\text{Δ}\left( {\theta;\overset{\rightarrow}{x}} \right) = \left\langle {\overline{0}\left| {Q\left( {\theta;\overset{\rightarrow}{x}} \right)^{\dagger}P(\theta)Q\left( {\theta;\overset{\rightarrow}{x}} \right)\left| \overline{0} \right)} \right)} \right\rangle} & \text{­­­(26)} \end{matrix}$

is the bias of the likelihood function (from now on, we will use ℙ′(d|θ;x) and Δ′(θ;x) to denote the derivatives of ℙ(d|θ; x) and Δ(θ;x) with respect to 0, respectively). In particular, if x =

$\left( {\frac{\pi}{2},\frac{\pi}{2},\ldots,\frac{\pi}{2},\frac{\pi}{2}} \right),$

then we have Δ(θ;x) = cos ((2L + 1)0). Namely, the bias of the likelihood function for this x is the Chebyshev polynomial of degree 2L + 1 (of the first kind) of Π. For this reason, the likelihood function for this x will be referred to herein as the Chebyshev likelihood function (CLF). Section 5 will explore the performance gap between CLFs and general ELFs.

In reality, quantum devices are subject to noise. To make the estimation process robust against errors, embodiments of the present invention may incorporate the following noise model into the likelihood function.

In practice, the establishment of the noise model may leverage a procedure for calibrating the likelihood function for the specific device being used. With respect to Bayesian inference, the parameters of this model are known as nuisance parameters; the target parameter does not depend directly on them, but they determine how the data relates to the target parameter and, hence, may be incorporated into the inference process. The remainder of this disclosure will assume that the noise model has been calibrated to sufficient precision so as to render the effect of model error negligible.

Assume that the noisy version of each circuit layer V(x_(2j))U(θ;x_(2j-1)) implements a mixture of the target operation and the completely depolarizing channel ³ acting on the same input state, i.e.

$\begin{matrix} {U_{j}(\rho) = pV\left( x_{2j} \right)U\left( {\theta;x_{2j - 1}} \right)\rho U\left( {\theta;x_{2j - 1}} \right)^{\dagger}V\left( x_{2j} \right)^{\dagger} + \left( {1 - p} \right)\frac{I}{2^{n}},} & \text{­­­(27)} \end{matrix}$

where p is the fidelity of this layer. Under composition of such imperfect operations, the output state of the L-layer circuit becomes

$\begin{matrix} {\rho_{L} = p^{L}Q\left( {\theta;\overset{\rightarrow}{x}} \right)\left| A \right\rangle\left\langle A \right|Q\left( {\theta;\overset{\rightarrow}{x}} \right)^{\dagger} + \left( {1 - p^{L}} \right)\frac{I}{2^{n}}.} & \text{­­­(28)} \end{matrix}$

This imperfect circuit is preceded by an imperfect preparation of |A〉 and followed by an imperfect measurement of P. In the context of randomized benchmarking, such errors are referred to as state preparation and measurement (SPAM) errors. Embodiments of the present invention may also model SPAM error with a depolarizing model, taking the noisy preparation of |A〉 to be

$p_{SP}\left| A \right\rangle\left\langle A \right| + \left( {1 - p_{SP}} \right)\frac{I}{2^{n}}$

and taking the noisy measurement of P to be the POVM

$\left\{ {p_{M}\frac{I + P}{2} + \left( {1 -} \right)} \right)$

$\left( {\left( p_{M} \right)\frac{I}{2},p_{M}\frac{I - P}{2} + \left( {1 - p_{M}} \right)\frac{I}{2}} \right\}.$

Combining the SPAM error parameters into p̅ = p_(SP)p_(M), arrives at a model for the noisy likelihood function

³ The depolarizing model assumes that the gates comprising each layer are sufficiently random to prevent systematic build-up of coherent error. There exist techniques such as randomized compiling which make this depolarizing model more accurate.

$\begin{matrix} {{\mathbb{P}}\left( {d|\theta);f,\overset{\rightarrow}{x}} \right) = \frac{1}{2}\left\lbrack {1 + \left( {- 1} \right)^{d}f\text{Δ}\left( {\theta;\overset{\rightarrow}{x}} \right)} \right\rbrack,} & \text{­­­(29)} \end{matrix}$

where f = p̅p^(L) is the fidelity of the whole process for generating the ELF, and Δ(θ,x) is the bias of the ideal likelihood function as defined in Eq. (26) (from now on, this disclosure will use ℙ′(d|θ; ƒ, ěcx) to denote the derivative of ℙ(d|θ;ƒ,x) with respect to θ). Note that the overall effect of noise on the ELF is that it rescales the bias by a factor of ƒ. This implies that the less errored the generation process is, the steeper the resultant ELF is (which means it is more useful for Bayesian inference), as one would expect.

Before moving on to the discussion of Bayesian inference with ELFs, it is worth mentioning the following property of engineered likelihood functions, as it will play a role in Section 4. The concepts of trigono-multilinear and trigono-multiquadratic functions are known. Basically, a multivariable function f: ℝ^(k) → ℂ is trigono-multilinear if for any j ∈ {1,2, ..., k}, f(x₁, x₂, ..., x_(k)) may be written as

$\begin{matrix} {f\left( {x_{1},x_{2},\ldots,x_{k}} \right) = C_{j}\left( {\overset{\rightarrow}{x}}_{\neg j} \right)\cos\left( x_{j} \right) + S_{j}\left( {\overset{\rightarrow}{x}}_{\neg j} \right)\sin\left( x_{j} \right),} & \text{­­­(30)} \end{matrix}$

for some (complex-valued) functions C_(j) and S_(j) of x _(¬j): = (x₁, ..., x_(j-1), x_(j+1), x_(k)), and we call C_(j) and S_(j) the cosine-sine-decomposition (CSD) coefficient functions of f with respect to x_(j). Similarly, a multivariable function f:ℝ^(k) → ℂ is trigono-multiquadratic if for any j ∈ {1,2, ..., k}, f(x₁, x₂, ..., x_(k)) may be written as

$\begin{matrix} \begin{array}{l} {f\left( {x_{1},x_{2},\ldots,x_{k}} \right) =} \\ {C_{j}\left( {\overset{\rightarrow}{x}}_{\neg j} \right)\cos\left( {2x_{j}} \right) + S_{j}\left( {\overset{\rightarrow}{x}}_{\neg j} \right)\sin\left( {2x_{j}} \right) + B_{j}\left( {\overset{\rightarrow}{x}}_{\neg j} \right),} \end{array} & \text{­­­(31)} \end{matrix}$

for some (complex-valued) functions C_(j), S_(j) and B_(j) of x _(¬j): = (x₁, ..., x_(j-1), x_(j+1), x_(k)), and we call C_(j), S_(j) and B_(j) the cosine-sine-bias-decomposition (CSBD) coefficient functions of f with respect to x_(j). The concepts of trigono-multilinearity and trigono-multiquadraticity can be also naturally generalized to linear operators. Namely, a linear operator is trigono-multilinear (or trigono-multiquadratic) in a set of variables if each entry of this operator (written in an arbitrary basis) is trigono-multilinear (or trigono-multiquadratic) in the same variables. Now Eqs. (22), (23) and (24) imply that Q(θ; x) is a trigono-multilinear operator of x. Then it follows from Eq. (26) that Δ(θ;x) is a trigono-multiquadratic function of x. Furthermore, it is disclosed that the CSBD coefficient functions of Δ(θ;x) with respect to any x_(j) may be evaluated in O(L) time, and this greatly facilitates the construction of the algorithms in Section 4.1 for tuning the circuit angles x = (x₁, x₂, ..., x_(2L-1), x_(2L)).

3.2. Bayesian Inference With Engineered Likelihood Functions

With the model of (noisy) engineered likelihood functions in place, embodiments of the present invention will be described for tuning the circuit parameters x and performing Bayesian inference with the resultant likelihood functions for amplitude estimation.

Let us begin with a high-level overview of embodiments of algorithm for estimating Π = cos (0) = 〈A|P|A〉. For convenience, such embodiments may work with 0 = arccos (Π) rather than with Π. Embodiments of the present invention may use a Gaussian distribution to represent knowledge of θ and make this distribution gradually converge to the true value of θ as the inference process proceeds. Embodiments of the present invention may start with an initial distribution of Π (which can be generated by standard sampling or domain knowledge) and convert it to the initial distribution of θ. Then embodiments of the present invention may iterate the following procedure until a convergence criterion is satisfied. At each round, embodiments of the present invention may find the circuit parameters x that maximize the information gain from the measurement outcome d in certain sense (based on current knowledge of θ). Then the quantum circuit in FIG. 7 is executed with the optimized parameters x and a measurement outcome d ∈ {0,1} is received. Finally, embodiments of the present invention may update the distribution of θ by using Bayes rule, conditioned on d. Once this loop is finished, embodiments of the present invention may convert the final distribution of θ to the final distribution of Π, and set the mean of this distribution as the final estimate of Π. See FIG. 8 for a conceptual diagram of this algorithm.

The following describes each component of the above algorithm in more detail. Throughout the inference process, embodiments of the present invention use a Gaussian distribution to keep track of a belief of the value of θ. Namely, at each round, θ has prior distribution

$\begin{matrix} {\, p(\theta) = p\left( {\theta;\mu,\sigma} \right): = \frac{1}{\sqrt{2\pi}\sigma}\text{e}^{- \frac{{({\theta - \mu})}^{2}}{2\sigma^{2}}}} & \text{­­­(32)} \end{matrix}$

for some prior mean µ ∈ ℝ and prior variance σ² ∈ ℝ⁺. After receiving the measurement outcome d, embodiments of the present invention may compute the posterior distribution of θ by using Bayes rule

$\begin{matrix} {p\left( {\theta\left| {d;f;\overset{\rightarrow}{x}} \right)} \right) = \frac{{\mathbb{P}}\left( {\theta\left| {d;f;\overset{\rightarrow}{x}} \right)} \right)p(\theta)}{{\mathbb{P}}\left( {d;f;\overset{\rightarrow}{x}} \right)},} & \text{­­­(33)} \end{matrix}$

where the normalization factor, or model evidence, is defined as ℙ(d; f, x) = ∫ d θℙ(d|θ; f, x)p(θ) (recall that f is the fidelity of the process for generating the ELF). Although the true posterior distribution will not be a Gaussian, embodiments of the present invention may approximate it as such. Following previous methodology, embodiments of the present invention may replace the true posterior with a Gaussian distribution of the same mean and variance ⁴, and set it as the prior of θ for the next round. Embodiments of the present invention may repeat this measurement-and-Bayesian-update procedure until the distribution of θ is sufficiently concentrated around a single value.

Since the algorithm mainly works with θ and we are eventually interested in Π, embodiments of the present invention may make conversions between the estimators of θ and Π. This is done as follows. Suppose that at round t the prior distribution of θ is

N(μ_(t), σ_(t)²)

and the prior distribution of Π is

N(μ̂_(t), σ̂_(t)²)

(note that µ_(t), σ_(t), µ̂_(t) and σ̂ are random variables as they depend on the history of random measurement outcomes up to time t). The estimators of θ and Π at this round are µ_(t) and µ̂_(t), respectively. Given the distribution

N(μ_(t), σ_(t)²)

of θ, embodiments of the present invention may compute the mean µ̂_(t) and variance

σ̂_(t)²

of cos (0), and set

N(μ̂_(t), σ̂_(t)²)

as the distribution of Π. This step may be done analytically, as if X ~ N(µ, σ²), then

$\begin{matrix} {\mathbb{E}\left\lbrack {\cos(X)} \right\rbrack = e^{- \frac{\sigma^{2}}{2}}\cos(\mu),} & \text{­­­(34)} \end{matrix}$

$\begin{matrix} {EVar\left\lbrack {\cos(X)} \right\rbrack = \frac{1}{2}\left( {1 - e^{- \sigma^{2}}} \right)\left( {1 - e^{- \sigma^{2}}\cos\left( {2\mu} \right)} \right).} & \text{­­­(35)} \end{matrix}$

Conversely, given the distribution

N(μ̂_(t), σ̂_(t)²)

of Π, embodiments of the present invention may compute the mean µ_(t) and variance

σ_(t)²

of arccos (Π) (clipping Π to [-1,1]), and set

N(μ_(t), σ_(t)²)

as the distribution of θ. This step may be done numerically. Even though the cos x or arccos y function of a Gaussian variable is not truly Gaussian, embodiments of the present invention may approximate it as such and find that this has negligible impact on the performance of the algorithm.

⁴ Although embodiments of the present invention may compute the mean and variance of the posterior distribution p(θ|d; f, x) directly by definition, this approach is time-consuming, as it involves numerical integration. Instead, embodiments of the present invention may accelerate this process by taking advantage of certain property of engineered likelihood functions. See Section 4.2 for more details.

Method for tuning the circuit angles x may be implemented by embodiments of the present invention as follows. Ideally, they may be chosen carefully so that the mean squared error (MSE) of the estimator µ_(t) of θ decreases as fast as possible as t grows. In practice, however, it is hard to compute this quantity directly, and embodiments of the present invention may resort to a proxy of its value. The MSE of an estimator is a sum of the variance of the estimator and the squared bias of the estimator. The squared bias of µ_(t) may be smaller than its variance, i.e. |E[µ_(t)] - θ*|² < Var (µ_(t)), where θ* is the true value of θ. The variance

σ_(t)²

of θ is often close to the variance of µ_(t), i.e.

σ_(t)² ≈ Var(μ_(t))

with high probability. Combining these facts, we know that MSE (µ_(t)) ≤

2σ_(t)²

with high probability. So embodiments of the present invention may find the parameters x that minimize the variance

σ_(t)²

of θ instead.

Specifically, suppose θ has prior distribution N(µ,σ²). Upon receiving the measurement outcome d ∈ {0,1}, the expected posterior variance of 0 is

$\begin{matrix} {= \,\,\,\,\,\,\,\,\,\,\,\,\,\,\begin{matrix} {\mathbb{E}{}_{\text{d}}\left\lbrack {\text{Var}\left( {\theta\left| {d;f,\overset{\rightarrow}{x}} \right)} \right)} \right\rbrack} \\ {\sigma^{2}\left( {1 - \sigma^{2}\frac{f^{2}\left( {\partial_{\mu}b\left( {\mu,\sigma;\overset{\rightarrow}{x}} \right)} \right)^{2}}{1 - f^{2}\left( {b\left( {\mu,\sigma;\overset{\rightarrow}{x}} \right)} \right)^{2}}} \right),} \end{matrix}} & \text{­­­(36)} \end{matrix}$

where

$\begin{matrix} {b\left( {\mu,\sigma;\overset{\rightarrow}{x}} \right) = {\int_{- \infty}^{\infty}{\text{d}\theta p\left( {\theta;\mu,\sigma} \right)\Delta\left( {\theta;\overset{\rightarrow}{x}} \right)}}} & \text{­­­(37)} \end{matrix}$

in which Δ(θ;x) is the bias of the ideal likelihood function as defined in Eq. (26), and f is the fidelity of the process for generating the likelihood function. A quantity for engineering likelihood functions is now introduced and referred to herein as the variance reduction factor,

$\begin{matrix} {V\left( {\mu,\sigma;f,\overset{\rightarrow}{x}} \right): = \frac{f^{2}\left( {\partial_{\mu}b\left( {\mu,\sigma;\overset{\rightarrow}{x}} \right)} \right)^{2}}{1 - f^{2}\left( {b\left( {\mu,\sigma;\overset{\rightarrow}{x}} \right)} \right)^{2}}.} & \text{­­­(38)} \end{matrix}$

Then we have

$\begin{matrix} {\mathbb{E}_{\text{d}}\left\lbrack {\text{Var}\left( {\theta\left| {d;f,\overset{\rightarrow}{x}} \right)} \right)} \right\rbrack = \sigma^{2}\left\lbrack {1 - \sigma^{2}V\left( {\mu,\sigma;f,\overset{\rightarrow}{x}} \right)} \right\rbrack.} & \text{­­­(39)} \end{matrix}$

The larger V is, the faster the variance of θ decreases on average. Furthermore, to quantify the growth rate (per time step) of the inverse variance of θ, the following quantity may be used

$\begin{matrix} {R\left( {\mu,\sigma;f,\overset{\rightarrow}{x}} \right): = \frac{1}{T(L)}\left( {\frac{1}{\mathbb{E}_{\text{d}}\left\lbrack {Var\left( {\theta\left| {d;f,\overset{\rightarrow}{x}} \right)} \right)} \right\rbrack} - \frac{1}{\sigma^{2}}} \right)} & \text{­­­(40)} \end{matrix}$

$\begin{matrix} {= \frac{1}{T(L)}\frac{V\left( {\mu,\sigma;f,\overset{\rightarrow}{x}} \right)}{1 - \sigma^{2}V\left( {\mu,\sigma;f,\overset{\rightarrow}{x}} \right)},} & \text{­­­(41)} \end{matrix}$

where T(L) is the time cost of an inference round. Note that R is a monotonic function of V for V ∈ (0,1). Therefore, when the number L of circuit layers is fixed, embodiments of the present invention may maximize R (with respect to x) by maximizing V. In addition, when σ is small, R is approximately proportional to V, i.e. R ≈ V/T(L). The remainder of this disclosure will assume that the ansatz circuit contributes most significantly to the duration of the overall circuit. We take T(L) to be proportional to the number of times the ansatz is invoked in the circuit, setting T(L) = 2L + 1, where time is in units of ansatz duration.

Now techniques will be disclosed for finding the parameters x = (x₁, x₂, ..., x_(2L)) ∈ ℝ^(2L) that maximize the variance reduction factor V(µ, σ; f, x) for given µ ∈ ℝ, σ ∈ ℝ⁺ and f ∈ [0,1]. This optimization problem turns out to be difficult to solve in general. Fortunately, in practice, embodiments of the present invention may assume that the prior variance σ² of θ is small (e.g. at most 0.01), and in this case, V(µ, σ; f, x) may be approximated by the Fisher information of the likelihood function ℙ(d|θ; f, x) at 0 = µ, i.e.

$\begin{matrix} {V\left( {\mu,\sigma;f,\overset{\rightarrow}{x}} \right) \approx J\left( {\mu;f,\overset{\rightarrow}{x}} \right),\text{when}\sigma\text{is small,}} & \text{­­­(42)} \end{matrix}$

where

$\begin{matrix} {J\left( {\theta;f,\overset{\rightarrow}{x}} \right) = \mathbb{E}_{d}\left\lbrack \left( {\partial_{\theta}\log{\mathbb{P}}\left( {d\left| {\theta;f,\overset{\rightarrow}{x}} \right)} \right)} \right)^{2} \right\rbrack} & \text{­­­(43)} \end{matrix}$

$\begin{matrix} {= \frac{f^{2}\left( {\Delta^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)^{2}}{1 - f^{2}\left( {\Delta\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)^{2}}} & \text{­­­(44)} \end{matrix}$

is the Fisher information of the two-outcome likelihood function ℙ(d|θ;f,x) as defined in Eq. (29). Therefore, rather than directly optimizing the variance reduction factor V(µ,σ;f,x), embodiments of the present invention may optimize the Fisher information ℑ(µ; f, x), which may be done efficiently by embodiments of the present invention using the algorithms in Section 4.1.1. Furthermore, when the fidelity f of the process for generating the ELF is low, we have ℑ(θ; f, x) ≈ f²(Δ′(θ; x))². It follows that

$\begin{matrix} {V\left( {\mu,\sigma;f,\overset{\rightarrow}{x}} \right) \approx f^{2}\left( {\Delta^{\prime}\left( {\mu;\overset{\rightarrow}{x}} \right)} \right)^{2},\text{when both}\sigma\text{and}f\text{are small}\text{.}} & \text{­­­(45)} \end{matrix}$

So in this case, embodiments of the present invention may optimize |Δ′(µ;x)|, which is proportional to the slope of the likelihood function ℙ(d|θ;ƒ,x) at θ = µ, and this task may be accomplished efficiently by embodiments of the present invention using the algorithms in Section 4.1.2.

Finally, embodiments of the present invention may make a prediction on how fast the MSE of the estimator µ̂_(t) of Π as t grows. Suppose that the number L of circuit layers is fixed during the inference process. This gives

$\left. \text{MSE}\left( {\hat{\mu}}_{t} \right) = \Theta\left( \frac{1}{t} \right)\text{as}t\rightarrow\infty. \right.$

. The growth rate of the inverse MSE of µ̂_(t) may be predicted as follows. As t → ∞, we have µ_(t) → θ*, σ_(t) → 0, µ̂_(t) → Π*, and σ̂_(t) → 0 with high probability, where θ* and Π* are the true values of θ and Π, respectively. When this event happens, we have that for large t,

$\begin{matrix} {\frac{1}{\sigma_{t + 1}^{2}} - \frac{1}{\sigma_{t}^{2}} \approx J\left( {\mu_{t};f,{\overset{\rightarrow}{x}}_{t}} \right).} & \text{­­­(46)} \end{matrix}$

Consequently, by Eq. (35), we know that for large t,

$\begin{matrix} {\frac{1}{{\hat{\sigma}}_{t + 1}^{2}} - \frac{1}{{\hat{\sigma}}_{t}^{2}} \approx \frac{J\left( {\mu_{t};f,{\overset{\rightarrow}{x}}_{t}} \right)}{\overset{2}{sin}\left( \mu_{t} \right)},} & \text{­­­(47)} \end{matrix}$

where µ_(t) ≈ arccos (µ̂_(t)). Since the bias of µ̂_(t) is often much smaller than its standard deviation, and the latter can be approximated by σ̂_(t), we predict that for large t,

$\begin{matrix} {\text{MSE}\left( {\hat{\mu}}_{t} \right) \approx \frac{1 - {\hat{\mu}}_{t}^{2}}{tJ\left( {arccos\left( {\hat{\mu}}_{t} \right);f,{\overset{\rightarrow}{x}}_{t}} \right)}.} & \text{­­­(48)} \end{matrix}$

This means that the asymptotic growth rate (per time step) of the inverse MSE of µ̂_(t) should be roughly

$\begin{matrix} {{\hat{R}}_{0}\left( {\Pi^{\ast};f,\overset{\rightarrow}{x}} \right): = \frac{J\left( {arccos\,\left( \Pi^{\ast} \right);f,\overset{\rightarrow}{x}} \right)}{T(L)\left( {1 - \left( \Pi^{\ast} \right)^{2}} \right)},} & \text{­­­(49)} \end{matrix}$

where x is optimized with respect to µ* = arccos (Π*). This rate will be compared with the empirical growth rate of the inverse MSE of µ̂_(t) in Section 5.

4. Efficient Heuristic Algorithms for Circuit Parameter Tuning and Bayesian Inference

This section describes embodiments of heuristic algorithms for tuning the parameters x of the circuit in FIG. 7 and describes how embodiments of the present invention may efficiently carry out Bayesian inference with the resultant likelihood functions.

4.1. Efficient Maximization of Proxies of the Variance Reduction Factor

Algorithms, implemented according to embodiments of the present invention, for tuning the circuit angles x are based on maximizing two proxies of the variance reduction factor V - the Fisher information and slope of the likelihood function ℙ(d|θ;ƒ,x). All of these algorithms require efficient procedures for evaluating the CSBD coefficient functions of the bias Δ(θ;x) and its derivative Δ′(θ;x) with respect to x_(j) for j = 1,2, ...,2L. Recall that we have shown in Section 3.1 that the bias Δ(θ;x) is trigono-multiquadratic in ěcx. Namely, for any j ∈ {1,2,...,2L}, there exist functions C_(j)(θ;x _(¬j)), S_(j)(θ; x _(¬j)) and B_(j)(θ;x _(¬j)) of x _(¬j):= (x₁, ..., x_(j-1), x_(j+1), ..., x_(2L)) such that

$\begin{matrix} \begin{array}{l} {\Delta\left( {\theta;\overset{\rightarrow}{x}} \right) = C_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\cos\left( {2x_{j}} \right) + S_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\sin\left( {2x_{j}} \right)} \\ {+ B_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right).} \end{array} & \text{­­­(50)} \end{matrix}$

It follows that

$\begin{matrix} \begin{array}{l} {\Delta^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right) = C_{j}{}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\cos\left( {2x_{j}} \right) + S_{j}{}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\sin\left( {2x_{j}} \right)} \\ {+ {B^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} \end{array} & \text{­­­(51)} \end{matrix}$

is also trigono-multiquadratic in x, where C′_(j)(θ;x _(¬j)) = ∂_(θ)C_(j)(θ;x _(¬j)), S′_(j)(θ;x _(¬j)) = ∂_(θ)S_(j)(θ;x _(¬j)), B′_(j)(θ;x _(¬j)) = ∂_(θ)B_(j)(θ;x _(¬j)) are the derivatives of C_(j)(θ; x _(¬j)), S_(j)(θ;x _(¬j)), B_(j)(θ;x _(¬j)) with respect to θ, respectively. It turns out that given θ and x _(¬j), C_(j)(θ;x _(¬j)), each of S_(j)(θ;x _(¬j)), B_(j)(θ;x _(¬j)), C′_(j)(θ;x _(¬j)), S′_(j)(θ;x _(¬j)) and B′_(j)(θ;x _(¬j)) can be computed in O(L) time.

Lemma 1. Given 0 and x _(¬j), each of C_(j)(θ;x _(¬j)), S_(j)(θ;x _(¬j)), B_(j)(θ;x _(¬j)), C′_(j)(θ;x _(¬j)), S′_(j)(θ;x _(¬j)) and B′_(j)(θ;x _(¬j)) can be computed in O(L) time.

Proof. See Appendix C.

4.1.1. Maximizing the Fisher Information of the Likelihood Function

Embodiments of the present invention may execute one or more of two algorithms for maximizing the Fisher information of the likelihood function

${\mathbb{P}}\left( {d\left| {\theta;f,\overset{\rightarrow}{x}} \right)} \right)$

at a given point θ = µ (i.e. the prior mean of θ). Assume that a goal is to find

$\overset{\rightarrow}{x} \in {\mathbb{R}}^{2L}$

that maximize

$\begin{matrix} {J\left( {\mu;f,\overset{\rightarrow}{x}} \right) = \frac{f^{2}\left( {\Delta^{\prime}\left( {\mu;\overset{\rightarrow}{x}} \right)} \right)^{2}}{1 - f^{2}\Delta\left( {\mu;\overset{\rightarrow}{x}} \right)^{2}}.} & \text{­­­(52)} \end{matrix}$

The first algorithm is based on gradient ascent. Namely, it starts with a random initial point, and keeps taking steps proportional to the gradient of J at the current point, until ‘a convergence criterion is satisfied. Specifically, let x ^((t)) be the parameter vector at iteration t. Embodiments of the present invention may update it as follows:

$\begin{matrix} {{\overset{\rightarrow}{x}}^{({t + 1})} = {\overset{\rightarrow}{x}}^{(t)} + \delta(t)\nabla J\left( {\mu;f,\overset{\rightarrow}{x}} \right)\left| {}_{\overset{\rightarrow}{x} = {\overset{\rightarrow}{x}}^{(t)}} \right).} & \text{­­­(53)} \end{matrix}$

where δ: Z ^(≥0) → R ⁺ is the step size schedule⁵. This requires the calculation of the partial derivative of J(µ; f, x) with respect to each x_(j), which can be done as follows. Embodiments of the present invention first use the procedures in Lemma 1 to compute C_(j):= C_(j)(µ; x _(¬j)), Sj_(:)= S_(j)(µ; x _(¬j)), B_(j):= B_(j)(µ; x _(¬j)), C′_(j):= C′_(j)(µ; x _(¬j)), S′_(j):= S′_(j)(µ; x _(¬j)) and B′_(j):= B′_(j)(µ; x _(¬j)) for each j. This obtains

$\begin{matrix} {\text{Δ:}\mspace{6mu}\text{=Δ}\left( {\mu;\overset{\rightarrow}{x}} \right) = C_{j}\cos\left( {2x_{j}} \right) + S_{j}\sin\left( {2x_{j}} \right) + B_{j},} & \text{­­­(54)} \end{matrix}$

$\begin{matrix} {\text{Δ}^{\prime}\text{:}\mspace{6mu}\text{=}\text{Δ}^{\prime}\left( {\mu;\overset{\rightarrow}{x}} \right) = {C^{\prime}}_{j}\cos\left( {2x_{j}} \right) + {S^{\prime}}_{j}\sin\left( {2x_{j}} \right) + {B^{\prime}}_{j},} & \text{­­­(55)} \end{matrix}$

$\begin{matrix} {\chi_{j}:\mspace{6mu} = \frac{\partial\Delta\left( {\mu;\overset{\rightarrow}{x}} \right)}{\partial x_{j}} = 2\left( {- C_{j}\sin\left( {2x_{j}} \right) + S_{j}\cos\left( {2x_{j}} \right)} \right),} & \text{­­­(56)} \end{matrix}$

$\begin{matrix} {{\chi^{\prime}}_{j}:\mspace{6mu} = \frac{\partial\Delta^{\prime}\left( {\mu;\overset{\rightarrow}{x}} \right)}{\partial x_{j}} = 2\left( {- {C^{\prime}}_{j}\sin\left( {2x_{j}} \right) + {S^{\prime}}_{j}\cos\left( {2x_{j}} \right)} \right);} & \text{­­­(57)} \end{matrix}$

Knowing these quantities, embodiments of the present invention may compute the partial derivative of J(µ; f, x) with respect to x_(j) as follows:

$\begin{matrix} {\gamma_{j} = \frac{\partial J\left( {\mu;f,\overset{\rightarrow}{x}} \right)}{\partial x_{j}} = \frac{2f^{2}\left\lbrack {\left( {1 - f^{2}\Delta^{2}} \right)\Delta^{\prime}{\chi^{\prime}}_{j} + f^{2}\Delta\chi_{j}\left( \Delta^{\prime} \right)^{2}} \right\rbrack}{\left\lbrack {1 - f^{2}\Delta^{2}} \right\rbrack^{2}}.} & \text{­­­(58)} \end{matrix}$

⁵In the simplest case, δ(t) = δ is constant. But in order to achieve better performance, we might want δ(t) → 0 as t → ∞.

Embodiments of the present invention may repeat this procedure for j = 1,2, ...,2L. Then embodiments of the present invention may obtain ∇J(µ; f, x) = (γ₁, γ₂, ..., γ_(2L)). Each iteration of the algorithm takes O(L²) time. The number of iterations in the algorithm depends on the initial point, the termination criterion and the step size schedule δ. See Algorithm 65 for more details.

The second algorithm is based on coordinate ascent. Unlike gradient ascent, this algorithm does not require step sizes, and allows each variable to change dramatically in a single step. As a consequence, it may converge faster than the previous algorithm. Specifically, embodiments of the present invention which implement this algorithm may start with a random initial point, and successively maximize the objective function J(µ; f, x) along coordinate directions, until a convergence criterion is satisfied. At the j-th step of each round, it solves the following single-variable optimization problem for a coordinate x_(j):

$\begin{matrix} {\underset{z}{\arg\max}\frac{f^{2}\left( {{C^{\prime}}_{j}cos\left( {2z} \right) + {S^{\prime}}_{j}sin\left( {2z} \right) + {B^{\prime}}_{j}} \right)^{2}}{1 - f^{2}\left( {C_{j}cos\left( {2z} \right) + S_{j}sin\left( {2z} \right) + B_{j}} \right)^{2}},} & \text{­­­(59)} \end{matrix}$

where C_(j) = C_(j)(µ; x _(¬j)), S_(j) = S_(j)(µ; x _(¬j)), B_(j) = B_(j)(µ; x _(¬j)), C′_(j) = C′_(j)(µ; x _(¬j)), S′_(j) = S′_(j)(µ; x _(¬j)), B′_(j) = B′_(j)(µ;x _(¬j)) may be computed in O(L) time by the procedures in Lemma 1. This single-variable optimization problem may be tackled by standard gradient-based methods, and we set x_(j) to be its solution. Repeat this procedure for j = 1,2, ...,2L. This algorithm produces a sequence x ⁽⁰⁾, x ⁽¹⁾, x ⁽²⁾, ..., such that J(µ; f, x ⁰) ≤ J(µ; f, x ¹) ≤ J(µ; f, x ²) ≤ ... . Namely, the value of J(µ; f, x ^(t)) increases monotonically as t grows. Each round of the algorithm takes O(L²) time. The number of rounds in the algorithm depends on the initial point and the termination criterion.

4.1.2. Maximizing the Slope of the Likelihood Function

Embodiments of the present invention may perform one or more of two algorithms for maximizing the slope of the likelihood function P(d|θ; f, x) at a given point θ = µ (i.e. the prior mean of θ). Assume that a goal is to find x ∈ R ^(2L) that maximize |P′(µ; f, x) | = f|Δ′(µ; x)|/2.

Similar to Algorithms 65 and 65 for Fisher information maximization, the algorithms for slope maximization are also based on gradient ascent and coordinate ascent, respectively. They both call the procedures in Lemma 1 to evaluate C′(µ; x _(¬j)), S′(µ; x _(¬j)) and B′(µ; x _(¬j)) for given µ and x _(¬j). However, the gradient-ascent-based algorithm uses the above quantities to compute the partial derivative of (Δ′ (µ; x))² with respect to x_(j), while the coordinate-ascent-based algorithm uses them to directly update the value of x_(j). These algorithms are formally described in Algorithms 1 and 2, respectively.

4.2. Approximate Bayesian Inference With Engineered Likelihood Functions

With the algorithms for tuning the circuit parameters x in place, we now describe how to efficiently carry out Bayesian inference with the resultant likelihood functions. Embodiments of the present invention may compute the posterior mean and variance of θ directly after receiving a measurement outcome d. But this approach is time-consuming, as it involves numerical integration. By taking advantage of certain property of the engineered likelihood functions, embodiments of the present invention may greatly accelerate this process.

Suppose θ has prior distribution N(µ, σ²), where σ « ⅟L, and the fidelity of the process for generating the ELF is f. Embodiments of the present invention may find that the parameters x = (x₁, x₂, ..., x_(2L)) that maximize J(µ; f, x) (or |Δ′(µ; f, x)|) satisfy the following property: When θ is close to µ, i.e. θ ∈ [µ - O(σ), µ + O(σ)], we have

$\begin{matrix} {{\mathbb{P}}\left( {d|\theta);f,\overset{\rightarrow}{x}} \right) \approx \frac{1 + \left( {- 1} \right)^{d}f\mspace{6mu} sin\left( {r\theta + b} \right)}{2}} & \text{­­­(67)} \end{matrix}$

for some r, b ∈ R. Namely, embodiments of the present invention may approximate Δ(θ;x) by a sinusoidal function in this region of 0. FIG. 13 illustrates one such example.

Embodiments of the present invention may find the best-fitting r and b by solving the following least squares problem:

$\begin{matrix} {\left( {r^{\ast},b^{\ast}} \right) = \underset{r,b}{\arg\min}{\sum\limits_{\theta \in \text{Θ}}\left| {\arcsin\left( {\text{Δ}\left( {\theta;\overset{\rightarrow}{x}} \right)} \right) - r\theta - b} \right|^{2}},} & \text{­­­(68)} \end{matrix}$

where Θ = {θ₁, θ₂, ..., θ_(k)} ⊆ [µ - O(σ), µ + O(σ)]. This least-squares problem has the following analytical solution:

$\begin{matrix} {\left( \begin{array}{l} r^{\ast} \\ b^{\ast} \end{array} \right) = A^{+}z = \left( {A^{T}A} \right)^{- 1}A^{T}z,} & \text{­­­(69)} \end{matrix}$

where

$\begin{matrix} {A = \left( \begin{array}{ll} \theta_{1} & 1 \\ \theta_{2} & 1 \\  \vdots & \vdots \\ \theta_{k} & 1 \end{array} \right),z = \begin{pmatrix} {\arcsin\left( {\text{Δ}\left( {\theta_{1};\overset{\rightarrow}{x}} \right)} \right)} \\ {\arcsin\left( {\text{Δ}\left( {\theta_{2};\overset{\rightarrow}{x}} \right)} \right)} \\  \vdots \\ {\arcsin\left( {\text{Δ}\left( {\theta_{k};\overset{\rightarrow}{x}} \right)} \right)} \end{pmatrix}.} & \text{­­­(76)} \end{matrix}$

FIG. 13 demonstrates an example of the true and fitted likelihood functions.

Once embodiments of the present invention obtain the optimum r and b, they may approximate the posterior mean and variance of θ by the ones for

$\begin{matrix} {{\mathbb{P}}\left( {d|\theta);f} \right) \approx \frac{1 + \left( {- 1} \right)^{d}f\mspace{6mu} sin\left( {r\theta + b} \right)}{2},} & \text{­­­(77)} \end{matrix}$

which have analytical formulas. Specifically, suppose θ has prior distribution

N(μ_(k), σ_(k)²)

at round k. Let d_(k) be the measurement outcome and (r_(k), b_(k)) be the best-fitting parameters at this round. Then embodiments of the present invention may approximate the posterior mean and variance of θ by

$\begin{matrix} {\mu_{k + 1} = \mu_{k} + \frac{\left( {- 1} \right)^{d_{k}}fe^{{- r_{k}^{2}\sigma_{k}^{2}}/2}r_{k}\sigma_{k}^{2}\mspace{6mu} cos\left( {r_{k}\mu_{k} + \mspace{6mu} b_{k}} \right)}{1 + \left( {- 1} \right)^{d_{k}}fe^{{- r_{k}^{2}\sigma_{k}^{2}}/2}sin\left( {r_{k}\mu_{k} + \mspace{6mu} b_{k}} \right)},} & \text{­­­(78)} \end{matrix}$

$\begin{matrix} {\sigma_{k + 1}^{2} = \sigma_{k}^{2}\left( {1 - \frac{fr_{k}^{2}\sigma_{k}^{2}e^{{- r_{k}^{2}\sigma_{k}^{2}}/2}\left\lbrack {fe^{{- r_{k}^{2}\sigma_{k}^{2}}/2} + \left( {- 1} \right)^{d_{k}}sin\left( {r_{k}\mu_{k} + \mspace{6mu} b_{k}} \right)} \right\rbrack}{\left\lbrack {1 + \left( {- 1} \right)^{d_{k}}fe^{{- r_{k}^{2}\sigma_{k}^{2}}/2}sin\left( {r_{k}\mu_{k} + \mspace{6mu} b_{k}} \right)} \right\rbrack^{2}}} \right).} & \text{­­­(79)} \end{matrix}$

After that, embodiments of the present invention may proceed to the next round, setting

N(μ_(k + 1), σ_(k + 1)²)

as the prior distribution of θ for that round.

Note that, as FIG. 13 illustrates, the difference between the true and fitted likelihood functions can be large when θ is far from µ, i.e. |θ - µ| » σ. But since the prior distribution p(θ) =

$\frac{1}{\sqrt{2\pi}\sigma}e^{- \frac{{({\theta - \mu})}^{2}}{2\sigma^{2}}}$

decays exponentially in |θ - µ|, such θ′s have little contribution to the computation of posterior mean and variance of θ. So Eqs. (78) and (79) give highly accurate estimates of the posterior mean and variance of θ, and their errors have negligible impact on the performance of the whole algorithm.

5. Simulation Results

This section describes certain results of simulating Bayesian inference with engineered likelihood functions for amplitude estimation. These results demonstrate certain advantages of certain engineered likelihood functions over unengineered ones, as well as the impacts of circuit depth and fidelity on their performance.

5.1. Experimental Details

In our experiments, we assume that it takes much less time to implement U(x) = exp (-ixP) and perform the projective measurement

$\left\{ {\frac{I + P}{2},\frac{I - P}{2}} \right\}$

than to implement A. So when the number of circuit layer is L, the time cost of an inference round is roughly (2L + 1)T(A), where T(A) is the time cost of A (note that an L-layer circuit makes 2L + 1 uses of A and A^(†)). For simplicity, assume that A takes unit time (i.e. T(A) = 1) in the upcoming discussion. Moreover, assume that there is no error in the preparation and measurements of quantum states, i.e. p̅ = 1, in the experiments.

Suppose we aim to estimate the expectation value Π = cos (θ) = 〈A|P|A〉. Let µ̂_(t) be the estimator of Π at time t. Note that µ̂_(t) itself is a random variable, since it depends on the history of random measurement outcomes up to time t. We measure the performance of a scheme by the root-mean-squared error (RMSE) of µ̂_(t), which is given by

$\begin{matrix} {\text{RMSE}_{\text{t}} = \sqrt{\text{MSE}_{\text{t}}} = \sqrt{\mathbb{E}\left\lbrack \left( {{\hat{\mu}}_{t} - \text{Π}} \right)^{2} \right\rbrack}.} & \text{­­­(80)} \end{matrix}$

The following will describe how fast RMSE_(t) decays as t grows for various schemes, including the ancilla-based Chebyshev likelihood function (AB CLF), ancilla-based engineered likelihood function (AB ELF), ancilla-free Chebyshev likelihood function (AF CLF), and ancilla-free engineered likelihood function (AF ELF).

In general, the distribution of µ̂_(t) is difficult to characterize, and there is no analytical formula for RMSE_(t). To estimate this quantity, embodiments of the present invention may execute the inference process M times, and collect M samples

μ̂_(t)⁽¹⁾, μ̂_(t)⁽²⁾, …, μ̂_(t)^((M))

of µ̂_(t), where

μ̂_(t)^((i))

be the estimate of Π at time t in the i-th run, for i = 1,2, ..., M. Then embodiments of the present invention may use the quantity

$\begin{matrix} {\overline{\text{RMSE}_{\text{t}}}:\mspace{6mu} = \sqrt{\frac{1}{M}{\sum\limits_{i = 1}^{M}\left( {{\hat{\mu}}_{t}^{(i)} - \text{Π}} \right)^{2}}}.} & \text{­­­(81)} \end{matrix}$

to approximate the true RMSE_(t). In our experiments, we set M = 300 and find that this leads to satisfactory results.

Embodiments of the present invention may use coordinate-ascent-based Algorithms 2 and 6 to optimize the circuit parameters x in the ancilla-free and ancilla-based cases, respectively. This shows that Algorithms 1 and 2 produce solutions of equal quality, and the same statement holds for Algorithms 5 and 6. So our experimental results would not change if we had used gradient-ascent-based Algorithms 1 and 5 to tune the circuit angles x instead.

For Bayesian update with ELFs, embodiments of the present invention may use the methods in Section 4.2 and Appendix A.2 to compute the posterior mean and variance of θ in the ancilla-free and ancilla-based cases, respectively. In particular, during the sinusoidal fitting of ELFs, embodiments of the present invention may set Θ = {µ - σ, µ - 0.8σ, ..., µ + 0.8σ, µ + σ} (i.e. Θ contains 11 uniformly distributed points in [µ - σ, µ + σ]) in Eqs. (68) and (148). We find that this is sufficient for obtaining high-quality sinusoidal fits of ELFs.

6. A Model for Noisy Algorithm Performance

Embodiments of the present invention may implement a model for the runtime needed to achieve a target mean-squared error in the estimate of Π as it is scaled to larger systems and run on devices with better gate fidelities. This model may be built on two main assumptions. The first is that the growth rate of the inverse mean squared error is well-described by half the inverse variance rate expression (c.f. Eq. (40)). The half is due to the conservative estimate that the variance and squared bias contribute equally to the mean squared error (simulations from the previous section show that the squared bias tends to be less than the variance). The second assumption is an empirical lower bound on the variance reduction factor, which is motivated by numerical investigations of the Chebyshev likelihood function.

We carry out analysis for the MSE with respect to the estimate of θ. We will then convert the MSE of this estimate to an estimate of MSE with respect to Π. Our strategy will be to integrate upper and lower bounds for the rate expression R(µ, σ; f, m) in Eq. (40) to arrive at bounds for inverse MSE as a function of time.

To help our analysis we make the substitution m = T(L) = 2L + 1 and reparameterize the way noise is incorporated by introducing λ and α such that ƒ² = p ²p^(2L) = e^(-λ(2L+1)-α) = e^(-λm-α).

The upper and lower bounds on this rate expression are based on findings for the Chebyshev likelihood functions, where x = (π/2)^(2L). Since the Chebyshev likelihood functions are a subset of the engineered likelihood functions, a lower bound on the Chebyshev performance gives a lower bound on the ELF performance. We leave as a conjecture that the upper bound for this rate in the case of ELF is a small multiple (e.g. 1.5) of the upper bound we have established for the Chebyshev rate.

The Chebyshev upper bound is established as follows. For fixed σ, λ, and m, one can show⁶ that the variance reduction factor achieves a maximum value of V = m² exp ( — m²σ² — λm — α), occurring at µ = π/2. This expression is less than m² e ^(-m2) ^(σ) ² , which achieves a maximum of

$\left( {\text{e}\sigma^{2}} \right)^{- 1}\text{at}m = \frac{1}{\sigma}.$

Thus, the factor 1/(1- σ²V) cannot exceed1/(1 - e⁻¹) ≈ 1.582. Putting this all together, for fixed σ, λ, and m, the maximum rate is upper bounded as R(µ, σ; λ, α, m) ≤

$\frac{em}{e - 1}\exp\left( {- m^{2}\sigma^{2} - \lambda m - \alpha} \right).$

This follows from the fact that R is monotonic in V and that V is maximized at µ = π/2. In practice, embodiments of the present invention may use a value of L which maximizes the inverse variance rate. The rate achieved by discrete L cannot exceed the value obtained when optimizing the above upper bound over continuous value of m. This optimal value is realized for

${1/m} = \frac{1}{2}\left( {\sqrt{\lambda^{2} + 8\sigma^{2}} + \lambda} \right).$

We define R(σ;λ,α) by evaluating R(π/ 2, σ; λ, α, m) at this optimum value,

$\begin{matrix} {\overline{R}\left( {\sigma;\lambda,\alpha} \right) = \frac{2e^{- \alpha - 1}}{\sqrt{\lambda^{2} + 8\sigma^{2}} + \lambda}\exp\left( \frac{2\sigma^{2}}{4\sigma^{2} + \lambda^{2} + \lambda^{2}\sqrt{{8\sigma^{2}}/{\lambda^{2} + 1}}} \right),} & \text{­­­(82)} \end{matrix}$

which gives the upper bound on the Chebyshev rate

⁶ For the Chebyshev likelihood functions, we can express the variance reduction factor as

$v\left( {\mu,\sigma;f,\left( \frac{\pi}{2} \right)^{2L}} \right) =$

$m_{L}^{2}/\left( {1 + \left( {f^{- 2}\text{e}^{m_{L}^{2}\sigma^{2}} - 1} \right)\overset{2}{\text{csc}}\left( {m_{L}\mu} \right)} \right)$

whenever sin (m_(L)µ) ≠ 0. Then,

$\overset{2}{\text{csc}}\left( {m_{L}\mu} \right) \geq 1$

implies that

$\begin{matrix} {R_{C}^{\ast}\left( {\mu,\sigma;\lambda,\alpha} \right) = \max\limits_{L}R\left( {\mu,\sigma;\lambda,\alpha,m} \right) \leq \frac{e}{e - 1}\overline{R}\left( {\sigma;\lambda,\alpha} \right).} & \text{­­­(83)} \end{matrix}$

Embodiments of the present invention do not have an analytic lower bound on the Chebyshev likelihood performance. We can establish an empirical lower bound based on numerical checks. For any fixed L, the inverse variance rate is zero at the 2L + 2 points µ ∈ {0, π/(2L + 1),2π/(2L + 1), ...,2Lπ/(2L + 1), π}. Since the rate is zero at these end points for all L, the global lower bound on

R_(C)^(*)

is zero. However, we are not concerned with the poor performance of the inverse variance rate near these end points. When we convert the estimator from θ̂ to Π̂ = cos θ̂, the information gain near these end point actually tends to a large value. For the purpose of establishing useful bounds, we will restrict µ to be in the range [0.1π, 0.9π]. In the numerical tests ⁷ we find that for all µ ∈ [0.1π, 0.9π], there is always a choice of L for which the inverse variance rate is above (e - 1)²/e² ≈ 0.40 times the upper bound. Putting these together, we have

$\begin{matrix} {\frac{e - 1}{e}\overline{R}\left( {\sigma;\lambda,\alpha} \right) \leq R_{C}^{\ast}\left( {\mu,\sigma;\lambda,\alpha} \right) \leq \frac{e}{e - 1}\overline{R}\left( {\sigma;\lambda,\alpha} \right).} & \text{­­­(84)} \end{matrix}$

It is important to note that by letting m be continuous, certain values of σ and λ can lead to an optimal m for which L = (m — 1)/2 is negative. Therefore, these results apply only in the case that λ ≤ 1, which ensures that m ≥ 1. We expect this model to break down in the large-noise regime (i.e. λ ≥ 1).

For now, we will assume that the rate tracks the geometric mean of these two bounds, i.e.

$R_{C}^{\ast}\left( {\sigma,\lambda,\mu} \right) = \overline{R}\left( {\sigma,\lambda} \right),$

keeping in mind that the upper and lower bounds are small constant factors off of this.

Assume that the inverse variance grows continuously in time at a rate given by the difference quotient expression captured by the inverse-variance rate,

$R^{\ast} = \frac{d}{dt}\frac{1}{\sigma^{2}}.$

Letting F = ⅟σ² denote this inverse variance, the rate equation above can be recast as a differential equation for F,

⁷We searched over a uniform grid of 50000 values of θ, L values from L*/3 to 3L*, where L* is to the optimized value used to arrive at Eq. 82, and σ and λ ranging over [10⁻¹, 10⁻², ...,10⁻⁵]. For each (σ,λ) pair we found the θ for which the maximum inverse variance rate (over L) is a minimum. For all (σ, λ) pairs checked, this worst-case rate was always between 0.4 and 0.5, with the smallest valuefound being R = 0.41700368 ≥ (e - 1)²/e².

$\begin{matrix} {\frac{dF}{dt} = \frac{2e^{- \alpha - 1}}{\lambda\sqrt{1 + {8/\left( {F\lambda^{2}} \right)}} + \lambda}\exp\left( \frac{2}{4 + \lambda^{2}F + \lambda^{2}F\sqrt{1 + {8/\left( {\lambda^{2}F} \right)}}} \right).} & \text{­­­(85)} \end{matrix}$

Through this expression, we can identify both the Heisenberg limit behavior and shot-noise limit behavior. For F « ⅟λ², the differential equation becomes

$\begin{matrix} {\frac{dF}{dt} = \frac{e^{- \alpha - {1/2}}}{\sqrt{2}}\sqrt{F},} & \text{­­­(86)} \end{matrix}$

which integrates to a quadratic growth of the inverse squared error F(t) ~ t². This is the signature of the Heisenberg limit regime. For F » ⅟λ², the rate approaches a constant,

$\begin{matrix} {\frac{dF}{dt} = \frac{e^{- \alpha - 1}}{\lambda}.} & \text{­­­(87)} \end{matrix}$

This regime yields a linear growth in the inverse squared error F(t) ~ t, indicative of the shot-noise limit regime.

In order to make the integral tractable, the rate expression may be replaced with integrable upper and lower bound expressions (to be used in tandem with our previous bounds). Letting x = λ²F, these bounds are re-expressed as,

$\begin{matrix} \begin{array}{l} {\frac{2e^{- \alpha - 1}\lambda}{1 + {1/{\sqrt{12x} + {\left( {x + 4} \right)/\sqrt{x^{2} + 8x}}}}} \geq \frac{dx}{dt}} \\ {\geq \frac{2e^{- \alpha - 1}\lambda}{1 + {1/{\sqrt{4x} + {\left( {x + 4} \right)/\sqrt{x^{2} + 8x}}}}}.} \end{array} & \text{­­­(88)} \end{matrix}$

From the upper bound we can establish a lower bound on the runtime, by treating time as a function of x and integrating,

$\begin{matrix} {{\int_{0}^{t}{d\mspace{6mu} t}} \geq {\int_{x_{0}}^{x_{f}}{\text{d}\mspace{6mu} x}}\frac{e^{\alpha + 1}}{2\lambda}\left( {1 + {1/{\sqrt{12x} + {\left( {x + 4} \right)/\sqrt{x^{2} + 8x}}}}} \right)} & \text{­­­(89)} \end{matrix}$

$\begin{matrix} {= \frac{e^{\alpha + 1}}{2\lambda}\left( {x_{f} + \sqrt{x_{f}/3} + \frac{1}{2}\sqrt{x_{f}^{2} + 8x_{f}} - x_{0} - \sqrt{x_{0}/3} - \frac{1}{2}\sqrt{x_{0}^{2} + 8x_{0}}} \right).} & \text{­­­(90)} \end{matrix}$

Similarly, we can use the lower bound to establish an upper bound on the runtime. Here we introduce our assumption that, in the worst case, the MSE of the phase estimate is twice the variance (i.e. the variance equals the bias), so the variance must reach half the MSE: λ²/x. In the best case, we assume the bias in the estimate is zero and set We combine these bounds with the upper and lower bounds of Eq. (84) to arrive at the bounds on the estimation runtime as a function of target MSE,

$\begin{matrix} \begin{array}{l} {\left( {e - 1} \right)\frac{e^{- \lambda}}{2{\overline{p}}^{2}}\left( {\frac{\lambda}{\varepsilon_{\theta}^{2}} + \frac{1}{\sqrt{3}\varepsilon_{\theta}} + \sqrt{\left( \frac{\lambda}{\varepsilon_{\theta}^{2}} \right)^{2} + \left( \frac{2\sqrt{2}}{\varepsilon_{\theta}} \right)^{2}}} \right) \leq t_{\varepsilon_{\theta}}} \\ {\leq \frac{e^{2}}{e - 1}\frac{e^{- \lambda}}{{\overline{p}}^{2}}\left( {\frac{\lambda}{\varepsilon_{\theta}^{2}} + \frac{1}{\sqrt{2}\varepsilon_{\theta}} + \sqrt{\left( \frac{\lambda}{\varepsilon_{\theta}^{2}} \right)^{2} + \left( \frac{2}{\varepsilon_{Þeta}} \right)^{2}}} \right),} \end{array} & \text{­­­(91)} \end{matrix}$

where θ ∈ [0.1π, 0.9π].

At this point, we can convert our phase estimate θ̂̂̂̂ back into the amplitude estimate Π. The MSE with respect to the amplitude estimate

ε_(∏)²

can be approximated in terms of the phase estimate MSE as

$\begin{matrix} \begin{matrix} {\varepsilon_{\prod}^{2} = \mathbb{E}\left( {\hat{\prod} - \prod} \right)^{2}} \\ {= \mathbb{E}\left( {\cos\mspace{6mu}\hat{\theta} - \cos\mspace{6mu}\theta} \right)^{2}} \\ {\approx \mathbb{E}\left( {\left( {\hat{\theta} - \theta} \right)\frac{d\mspace{6mu} cos\mspace{6mu}\theta}{d\theta}} \right)^{2}} \\ {= \varepsilon_{{}^{\theta}}^{2}\sin\limits^{2}\theta,} \end{matrix} & \text{­­­(92)} \end{matrix}$

where we have assumed that the distribution of the estimator is sufficiently peaked about θ to ignore higher-order terms. This leads to which can be substituted into the above expressions for the bounds, which hold for Π ∈ [cos 0.9π, cos 0.1π] ≈ [-0.95,0.95]. Dropping the estimator subscripts (as they only contribute constant factors), we can establish the runtime scaling in the low-noise and high-noise limits,

$\begin{matrix} {t_{\varepsilon} = \left\{ \begin{array}{ll} {O\left( {e^{\alpha}/\varepsilon} \right)} & {\lambda \ll \varepsilon,} \\ {O\left( {{e^{\alpha}\lambda}/\varepsilon^{2}} \right)} & {\lambda \gg \varepsilon,} \end{array} \right)} & \text{­­­(94)} \end{matrix}$

observing that the Heisenberg-limit scaling and shot-noise limit scaling are each recovered.

We arrived at these bounds using properties of Chebyshev likelihood functions. As we have shown in the previous section, by engineering likelihood functions, in many cases we can reduce estimation runtimes. Motivated by our numerical findings of the variance reduction factors of engineered likelihood functions (see, e.g. FIG. 19 ), we conjecture that using engineered likelihood functions increases the worst case inverse-variance rate in Eq. (84) to

$\overline{R}\left( {\sigma;\lambda,\alpha} \right) \leq R_{C}^{\ast}\left( {\mu,\sigma;\lambda,\alpha} \right).$

In order to give more meaning to this model, we will refine it to be in terms of number of qubits n and two-qubit gate fidelities f_(2Q). We consider the task of estimating the expectation value of a Pauli string P with respect to state |A〉. Assume that Π = 〈A|P|A〉 is very near zero so that ε² =

ε_(∏)² ≈ ε_(θ)².

Let the two-qubit gate depth of each of the L layers be D. We model the total layer fidelity as

p = f_(2Q)^(nD/2),

where we have ignored errors due to single-qubit gates. From this, we have λ =

$\alpha = 2\mspace{6mu}\ln\mspace{6mu}\left( {1/\overline{p}} \right) - \frac{1}{2}nD\mspace{6mu}\ln\mspace{6mu}\left( {1/f_{2Q}} \right).$

$\frac{1}{2}nD\mspace{6mu}\ln\mspace{6mu}\left( {1/f_{2Q}} \right)$

and

$\alpha = 2\mspace{6mu}\ln\mspace{6mu}\left( {1/\overline{p}} \right) - \frac{1}{2}nD\mspace{6mu}\ln\mspace{6mu}\left( {1/f_{2Q}} \right).$

Putting these together, we arrive at the runtime expression,

$\begin{matrix} \begin{array}{l} {t_{\varepsilon} = e\frac{f_{2Q}^{{nD}/2}}{2{\overline{p}}^{2}}\left( {\frac{nD\mspace{6mu} ln\left( {1/f_{2Q}} \right)}{2\varepsilon^{2}} + \frac{1}{\sqrt{6}\varepsilon} +} \right)} \\ {\left( \sqrt{\left( \frac{nD\mspace{6mu} ln\mspace{6mu}\left( {1/f_{2Q}} \right)}{2\varepsilon^{2}} \right)^{2} + \left( \frac{2\sqrt{2}}{\varepsilon} \right)^{2}} \right).} \end{array} & \text{­­­(95)} \end{matrix}$

Finally, we will put some meaningful numbers in this expression and estimate the required runtime in seconds as a function of two-qubit gate fidelities. To achieve quantum advantage we expect that the problem instance will require on the order of n = 100 logical qubits and that the two-qubit gate depth is on the order of the number of qubits, D = 200. Furthermore, we expect that target accuracies ε will need to be on the order of ε = 10⁻³ to 10⁻⁵. The runtime model measures time in terms of ansatz circuit durations. To convert this into seconds we assume each layer of two-qubit gates will take time G = 10⁻⁸s, which is an optimistic assumption for today’s superconducting qubit hardware. FIG. 26 shows this estimated runtime as a function of two-qubit gate fidelity.

The two-qubit gate fidelities required to reduce runtimes into a practical region will most likely require error correction. Performing quantum error correction requires an overhead which increases these runtimes. In designing quantum error correction protocols, it is essential that the improvement in gate fidelities is not outweighed by the increase in estimation runtime. The proposed model gives a means of quantifying this trade-off: the product of gate infidelity and (error-corrected) gate time should decrease as useful error correction is incorporated. In practice, there are many subtleties which should be accounted for to make a more rigorous statement. These include considering the variation in gate fidelities among gates in the circuit and the varying time costs of different types of gates. Nevertheless, the cost analyses afforded by this simple model may be a useful tool in the design of quantum gates, quantum chips, error correcting schemes, and noise mitigation schemes.

Appendix A. Ancilla-Based Scheme

In this appendix, we present an alternative scheme, called the ancilla-based scheme. In this scheme, the engineered likelihood function (ELF) is generated by the quantum circuit in FIG. 27 , in which x = (x₁, x₂, ..., x_(2L-1), x_(2L)) ∈ ℝ^(2L) are tunable parameters.

Assuming the circuit in FIG. 27 is noiseless, the engineered likelihood function is given by

$\begin{matrix} {{\mathbb{P}}\left( {d\left| {\theta;\overset{\rightarrow}{x}} \right)} \right) = \frac{1}{2}\left\lbrack {1 + \left( {- 1} \right)^{d}\text{Λ}\left( {\theta;\overset{\rightarrow}{x}} \right)} \right\rbrack,\quad\forall d\mspace{6mu} \in \mspace{6mu}\left\{ {0,1} \right\}} & \text{­­­(96)} \end{matrix}$

where

$\begin{matrix} {\text{Λ}\left( {\theta;\overset{\rightarrow}{x}} \right) = {Re}\left( \left\langle {A\left| {Q\left( {\theta;\overset{\rightarrow}{x}} \right)} \right|A} \right\rangle \right)} & \text{­­­(97)} \end{matrix}$

is the bias of the likelihood function. It turns out that most of the argument in Section 3.1 still holds in the ancilla-based case, except that Δ(θ; x) is replaced with Λ(θ; x). So we will use the same notation (e.g. |0〉, |1〉, X, Y, Z, I) as before, unless otherwise stated. In particular, when we take the errors in the circuit in FIG. 27 into account, the noisy likelihood function is given by

$\begin{matrix} {{\mathbb{P}}\left( {d\left| {\theta;f,\overset{\rightarrow}{x}} \right)} \right) = \frac{1}{2}\left\lbrack {1 + \left( {- 1} \right)^{d}f\text{Λ}\left( {\theta;\overset{\rightarrow}{x}} \right)} \right\rbrack,\quad\forall d\mspace{6mu} \in \mspace{6mu}\left\{ {0,1} \right\}} & \text{­­­(98)} \end{matrix}$

where ƒ is the fidelity of the process for generating the ELF. Note that, however, there does exist a difference between Δ(θ; x) and Λ(θ; x), as the former is trigono-multiqudaratic in x, while the latter is trigono-multilinear in x.

We will tune the circuit angles x and perform Bayesian inference with the resultant ELFs in the same way as in Section 3.2. In fact, the argument in Section 3.2 still holds in the ancilla-based case, except that we need to replace Δ(θ; x) with Λ(θ; x). So we will use the same notation as before, unless otherwise stated. In particular, we also define the variance reduction factor ν(µ, σ; ƒ, x) as in Eqs. (37) and (38), replacing Δ(θ; x) with Λ(θ; x). It can be shown,

$\begin{matrix} {\mathcal{V}\left( {\mu,\sigma;f,\overset{\rightarrow}{x}} \right) \approx \mathcal{J}\left( {\mu;f,\overset{\rightarrow}{x}} \right) = \frac{f^{2}\left( {\Lambda^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)^{2}}{1 - f^{2}\left( {\Lambda\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)^{2}},\mspace{6mu}\text{when}\mspace{6mu}\sigma\mspace{6mu}\text{is}\mspace{6mu}\text{small},} & \text{­­­(99)} \end{matrix}$

and

$\begin{matrix} {\mathcal{V}\left( {\mu,\sigma;f,\overset{\rightarrow}{x}} \right) \approx f^{2}\left( {\text{Λ}^{\prime}\left( {\mu;\overset{\rightarrow}{x}} \right)} \right)^{2},\mspace{6mu}\text{when}\mspace{6mu}\text{both}\mspace{6mu}\sigma\mspace{6mu}\text{and}\mspace{6mu} f\mspace{6mu}\text{are}\mspace{6mu}\text{small}.} & \text{­­­(100)} \end{matrix}$

Namely, the Fisher information and slope of the likelihood function ℙ(d|θ; ƒ, x) at θ = µ are two proxies of the variance reduction factor ν(µ, σ; ƒ, x) under reasonable assumptions. Since the direct optimization of V is hard in general, we will tune the parameters x by optimizing these proxies instead.

A.1. Efficient Maximization of Proxies of the Variance Reduction Factor

Now we present efficient heuristic algorithms for maximizing two proxies of the variance reduction factor V - the Fisher information and slope of the likelihood function ℙ(d|θ; ƒ, x). All of these algorithms make use of the following procedures for evaluating the CSD coefficient functions of the bias Λ(θ; x) and its derivative Λ′(θ;x) with respect to x_(j) for j = 1,2, ...,2L.

A.1.1. Evaluating the CSD Coefficient Functions of the Bias and its Derivative

Since Λ(θ; x) is trigono-multilinear in x, for any j ∈ {1,2, ...,2L}, there exist functions C_(j)(θ; x _(¬j)) and S_(j)(θ; x _(¬j)), which are trigono-multilinear in x _(¬j) = (x₁, ..., x_(j-1), x_(j+1), ..., x_(2L)), such that

$\begin{matrix} {\text{Λ}\left( {\theta;\overset{\rightarrow}{x}} \right) = C_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\cos\left( x_{j} \right) + S_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\sin\left( x_{j} \right).} & \text{­­­(101)} \end{matrix}$

It follows that

$\begin{matrix} {\text{Λ}^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right) = C_{j}{}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\cos\left( x_{j} \right) + S_{j}{}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\sin\left( x_{j} \right)} & \text{­­­(102)} \end{matrix}$

is also trigono-multilinear in x, where C′_(j)(θ;x _(¬j)) = ∂_(θ)C_(j)(θ; x _(¬j)) and S′_(j)(θ; x _(¬j)) = ∂_(θ)S_(j)(θ; x _(¬j)) are the derivatives of C_(j)(θ; x _(¬j)) and S_(j)(0; x _(¬j)) with respect to θ, respectively.

Our optimization algorithms require efficient procedures for evaluating C_(j)(0; x _(¬j)), S_(j)(0; x _(¬j)), C_(j)′(θ; x _(¬j)) and Sj′(θ; x _(¬j)) for given θ and x _(¬j). It turns out that these tasks can be accomplished in O(L) time.

Lemma 2. Given θ and x _(¬j), each of C_(j)(θ; x _(¬j)), S_(j)(θ; x _(¬j)), C_(j)′(θ; x_(¬j)) and S_(j)′(θ; x _(¬j)) can be computed in O(L) time.

Proof. For convenience, we introduce the following notation. Let W_(2i) = V(x_(2L-2i)), W_(2i+1) = U(θ;x_(2L-2i-1)), for i = 0,1, ...,L — 1. Furthermore, let W′_(j) = ∂_(θ)W_(j) for j = 0,1, ...,2L - 1. Note that W′_(j) = 0 if j is even. Then we define P_(a,b) = W_(a)W_(a+1) ... W_(b) if 0 ≤ a ≤ b ≤ 2L - 1, and P_(a,b) = I otherwise.

With this notation, it can be shown that

$\begin{matrix} {Q\left( {\theta;\overset{\rightarrow}{x}} \right) = P_{0,a - 1}W_{a}P_{a + 1,2L - 1},\mspace{6mu}\forall 0 \leq a \leq 2L - 1,} & \text{­­­(103)} \end{matrix}$

and

$\begin{matrix} \begin{array}{l} {Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right) = P_{0,0}{W^{\prime}}_{\mspace{6mu}\mspace{6mu} 1}P_{2,2L - 1} + P_{0,2}{W^{\prime}}_{\mspace{6mu}\mspace{6mu} 3}P_{4,2L - 1}} \\ {+ \cdots P_{0,2L - 4}{W^{\prime}}_{\mspace{6mu}\mspace{6mu} 2L - 3}P_{2L - 2,2L - 1} + P_{0,2L - 2}{W^{\prime}}_{\mspace{6mu}\mspace{6mu} 2L - 1}.} \end{array} & \text{­­­(104)} \end{matrix}$

In order to evaluate C_(j)(θ;x _(¬j)), S_(j)(θ;x _(¬j)), C_(j)′(θ;x _(¬j)) and S_(j)′(θ;x _(¬j)) for given θ and x _(¬j), we consider the case j is even and the case j is odd separately.

-   ● Case 1: j = 2(L - t) is even, where 0 ≤ t ≤ L — 1. In this case,     W_(2t) = V(x_(j)). Using the fact -   $\begin{matrix}     {Q\left( {\theta;\overset{\rightarrow}{x}} \right) = P_{0,2t - 1}W_{2t}P_{2t + 1,2L - 1}} & \text{­­­(105)}     \end{matrix}$ -   $\begin{matrix}     {= P_{0,2t - 1}\left( {\cos\left( x_{j} \right)\overline{I} - \text{i}\mspace{6mu}\sin\left( x_{j} \right)\overline{Z}} \right)P_{2t + 1,2L - 1}} & \text{­­­(106)}     \end{matrix}$ -   $\begin{matrix}     {= \cos\left( x_{j} \right)P_{0,2t - 1}P_{2t + 1,2L - 1} - \text{i}\mspace{6mu}\sin\left( x_{j} \right)P_{0,2t - 1}\overline{Z}P_{2t + 1,2L - 1},} & \text{­­­(107)}     \end{matrix}$ -   ● we obtain -   $\begin{matrix}     {\text{Λ}\left( {\theta;\overset{\rightarrow}{x}} \right) = C_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\cos\left( x_{j} \right) + S_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\sin\left( x_{j} \right),} & \text{­­­(108)}     \end{matrix}$ -   ● where -   $\begin{matrix}     {C_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) = {Re}\left( \left\langle {\overline{0}\left| {P_{0,2t - 1}P_{2t + 1,2L - 1}} \right|\overline{0}} \right\rangle \right),} & \text{­­­(109)}     \end{matrix}$ -   $\begin{matrix}     {S_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) = {Im}\left( \left\langle {\overline{0}\left| {P_{0,2t - 1}ZP_{2t + 1,2L - 1}} \right|\overline{0}} \right\rangle \right).} & \text{­­­(110)}     \end{matrix}$ -   ● Given θ and x _(¬j), we first compute P_(0,2t-1) and P_(2t+1,2L-1)     in O(L) time. Then we calculate C_(j)(θ; x _(¬j)) and S_(j)(θ; x     _(¬j)) by Eqs. (109) and (110). This procedure takes only O(L) time.

Next, we describe how to compute C′_(j)(θ;x _(¬j)) and S′_(j)(θ;x _(¬j)). Using Eq. (104) and the fact P_(a,b) = P_(a,2t-1)W_(2t)P_(2t+1,b), for any α ≤ 2t ≤ b, we obtain

$\begin{matrix} \begin{matrix} {Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right) = P_{0,0}{W^{\prime}}_{\mspace{6mu}\mspace{6mu} 1}P_{2,2t - 1}W_{2t}P_{2t + 1,2L - 1}} \\ {+ P_{0,2}{W^{\prime}}_{\mspace{6mu}\mspace{6mu} 3}P_{4,2t - 1}W_{2t}P_{2t + 1,2L - 1}} \\ {+ \cdots} \\ {+ P_{0,2t - 2}{W^{\prime}}_{\mspace{6mu}\mspace{6mu} 2t - 1}W_{2t}P_{2t + 1,2L - 1}} \\ {+ P_{0,2t - 1}W_{2t}{W^{\prime}}_{\mspace{6mu}\mspace{6mu} 2t + 1}P_{2t + 2,2L - 1}} \\ {+ \cdots} \\ {+ P_{0,2t - 1}W_{2t}P_{2t + 1,2L - 4}{W^{\prime}}_{\mspace{6mu}\mspace{6mu} 2L - 3}P_{2L - 2,2L - 1}} \\ {+ P_{0,2t - 1}W_{2t}P_{2t + 1,2L - 2}{W^{\prime}}_{\mspace{6mu}\mspace{6mu} 2L - 1}.} \end{matrix} & \text{­­­(111)} \end{matrix}$

Let

$\begin{matrix} {A_{t} = {\sum\limits_{s = 1}^{t}P_{0,2s - 2}}\mspace{6mu}{W^{\prime}}_{\mspace{6mu}\mspace{6mu} 2s - 1}P_{2s,2t - 1}} & \text{­­­(112)} \end{matrix}$

$\begin{matrix} {= {\sum\limits_{s = 1}^{t}P_{0,2s - 2}}U^{\prime}\left( {\theta;x_{2L - 2s + 1}} \right)P_{2s,2t - 1},} & \text{­­­(113)} \end{matrix}$

$\begin{matrix} {B_{t} = {\sum\limits_{s = t + 1}^{L}{P_{2t + 1,2s - 2}\mspace{6mu} W\prime_{2s - 1}P_{2s,2L - 1}}}} & \text{­­­(114)} \end{matrix}$

$\begin{matrix} {= {\sum\limits_{s = t + 1}^{L}{P_{2t + 1,2s - 2}\mspace{6mu} U\prime\mspace{6mu}\left( {\theta;\mspace{6mu} x_{2L - 2s + 1}} \right)P_{2s,2L - 1}.}}} & \text{­­­(115)} \end{matrix}$

Then Eq. (111) yields

$\begin{matrix} {Q\prime\left( {\theta;\mspace{6mu}\overset{\rightarrow}{x}} \right) = A_{t}W_{2t}P_{2t + 1,2L - 1} + P_{0,2t - 1}\mspace{6mu} W_{2t}B_{t}} & \text{­­­(116)} \end{matrix}$

$\begin{matrix} \begin{array}{l} {= A_{t}\left( {\cos\mspace{6mu}\left( x_{j} \right)\overline{I} - \text{i sin}\left( x_{j} \right)\mspace{6mu}\overline{Z}} \right)P_{2t + 1,2L - 1}} \\ {+ P_{0,2t - 1}\left( {\cos\mspace{6mu}\left( x_{j} \right)} \right)\overline{I} - \text{i}\mspace{6mu}\text{sin}\left( x_{j} \right)\left( {\mspace{6mu}\overline{Z}} \right)B_{t}} \end{array} & \text{­­­(117)} \end{matrix}$

$\begin{matrix} \begin{array}{l} {= \cos\mspace{6mu}\left( x_{j} \right)\mspace{6mu}\left( {A_{t}P_{2t + 1,2L - 1} + P_{0,2t - 1}B_{t}} \right)} \\ {- \text{i sin}\left( x_{j} \right)\mspace{6mu}\left( {A_{t}\overline{Z}P_{2t + 1,2L - 1} + P_{0,2t - 1}\overline{Z}B_{t}} \right),} \end{array} & \text{­­­(118)} \end{matrix}$

which leads to

$\begin{matrix} {\Lambda\prime\left( {\theta;\mspace{6mu}\overset{\rightarrow}{x}} \right) = C\prime_{j}\left( {\theta;\mspace{6mu}{\overset{\rightarrow}{x}}_{\neg j}} \right)\mspace{6mu}\cos\mspace{6mu}\left( x_{j} \right) + S\prime_{j}\left( {\theta;\mspace{6mu}{\overset{\rightarrow}{x}}_{\neg j}} \right)\mspace{6mu}\sin\mspace{6mu}\left( x_{j} \right),} & \text{­­­(119)} \end{matrix}$

where

$\begin{matrix} {C\prime_{j}\left( {\theta;\mspace{6mu}{\overset{\rightarrow}{x}}_{\neg j}} \right) = \text{Re}\left( \left\langle {\overline{0}\left| \left( {A_{t}P_{2t + 1,2L - 1} + P_{0,2t - 1}B_{t}} \right) \right|\overline{0}} \right\rangle \right),} & \text{­­­(120)} \end{matrix}$

$\begin{matrix} {S\prime_{j}\left( {\theta;\mspace{6mu}{\overset{\rightarrow}{x}}_{\neg j}} \right) = {Im}\left( \left\langle {\overline{0}\left| \left( {A_{t}\overline{Z}P_{2t + 1,2L - 1} + P_{0,2t - 1}\overline{Z}B_{t}} \right) \right|\overline{0}} \right\rangle \right).} & \text{­­­(121)} \end{matrix}$

Given θ and x _(¬j), we first compute the following matrices in a total of O(L) time by standard dynamic programming techniques:

-   ● P_(0,2 s-2) and P_(2s,2t-1) for s = 1,2, ..., t; -   ● P_(2t+1,2 s-2) and P_(2s,2L-1) for s = t + 1, t + 2, ..., L; -   ● P_(0,2t-1) and P_(2t+1,2L-1).

Then we compute A_(t) and B_(t) by Eqs. (113) and (115). After that, we calculate C′_(j)(θ; x _(¬j)) and S′_(j)(θ;x _(¬j)) by Eqs. (120) and (121). Overall, this procedure takes O(L) time.

-   1. Case 2: j = 2(L - t) - 1 is odd, where 0 ≤ t ≤ L — 1. In this     case, W_(2t+1) = U(θ; x_(j)). Using the fact -   $\begin{matrix}     {Q\left( {\theta;\mspace{6mu}\overset{\rightarrow}{x}} \right) = P_{0,2t}W_{2t + 1}P_{2t + 2,2L - 1}} & \text{­­­(122)}     \end{matrix}$ -   $\begin{matrix}     {= P_{0,2t}\left( {\cos\mspace{6mu}\left( x_{j} \right)\overline{I} - \text{i}\mspace{6mu}\text{sin}\left( x_{j} \right)P(\theta)} \right)P_{2t + 2,2L - 1}} & \text{­­­(123)}     \end{matrix}$ -   $\begin{matrix}     {= \cos\mspace{6mu}\left( x_{j} \right)\mspace{6mu} P_{0,2t}P_{2t + 2,2L - 1} - \text{i sin}\left( x_{j} \right)\mspace{6mu} P_{0,2t}P(\theta)P_{2t + 2,2L - 1},} & \text{­­­(124)}     \end{matrix}$ -   2. we obtain -   $\begin{matrix}     {\Lambda\left( {\theta;\mspace{6mu}\overset{\rightarrow}{x}} \right) = C_{j}\left( {\theta;\mspace{6mu}{\overset{\rightarrow}{x}}_{\neg j}} \right)\mspace{6mu}\cos\mspace{6mu}\left( x_{j} \right) + S_{j}\left( {\theta;\mspace{6mu}{\overset{\rightarrow}{x}}_{\neg j}} \right)\mspace{6mu}\sin\mspace{6mu}\left( x_{j} \right),} & \text{­­­(125)}     \end{matrix}$ -   3. where -   $\begin{matrix}     {C_{j}\left( {\theta;\mspace{6mu}{\overset{\rightarrow}{x}}_{\neg j}} \right) = {Re}\left( \left\langle {\overline{0}\left| {P_{0,2t}P_{2t + 2,2L - 1}} \right|\overline{0}} \right\rangle \right),} & \text{­­­(126)}     \end{matrix}$ -   $\begin{matrix}     {S_{j}\left( {\theta;\mspace{6mu}{\overset{\rightarrow}{x}}_{\neg j}} \right) = {Im}\left( \left\langle {\overline{0}\left| {P_{0,2t}P(\theta)P_{2t + 2,2L - 1}} \right|\overline{0}} \right\rangle \right).} & \text{­­­(127)}     \end{matrix}$ -   4. Given θ and x _(¬j), we first compute P_(0,2t) and P_(2t+2,2L-1)     in O(L) time. Then we calculate C_(j)(θ; x _(¬j)) and S_(j)(θ; x     _(¬j)) by Eqs. (126) and (127). This procedure takes only O(L) time.

Next, we describe how to compute C′_(j)(θ;x _(¬j)) and S′_(j)(θ;x _(¬j)). Using Eq. (104) and the fact P_(a,b) = P_(a,2t)W_(2t+1)P_(2t+2,b) for any α ≤ 2t + 1 ≤ b, we get

$\begin{matrix} \begin{array}{l} {Q\prime\left( {\theta;\mspace{6mu}\overset{\rightarrow}{x}} \right) = P_{0,0}W\prime_{\mspace{6mu} 1}P_{2,2t}W_{2t + 1}P_{2t + 2,2L - 1}} \\ {+ P_{0,2}W\prime_{\mspace{6mu} 3}P_{4,2t}W_{2t + 1}P_{2t + 2,2L - 1}} \\ {+ \mspace{6mu}\cdots} \\ {+ P_{0,2t - 2}W\prime_{\mspace{6mu} 2t - 1}P_{2t,2t}W_{2t + 1}P_{2t + 2,2L - 1}} \\ {+ P_{0,2t}W\prime_{\mspace{6mu} 2t + 1}P_{2t + 2,2L - 1}} \\ {+ P_{0,2t}W_{2t + 1}P_{2t + 2,2t + 2}W\prime_{\mspace{6mu} 2t + 3}P_{2t + 4,2L - 1}} \\ {+ \mspace{6mu}\cdots} \end{array} & \text{­­­(128)} \end{matrix}$

Let

$\begin{matrix} {A_{t} = {\sum\limits_{s = 1}^{t}{P_{0,2s - 2}\mspace{6mu} W\prime_{\mspace{6mu} 2s - 1}P_{2s,2t}}}} & \text{­­­(129)} \end{matrix}$

$\begin{matrix} {= {\sum\limits_{s = 1}^{t}{P_{0,2s - 2}\mspace{6mu} U\prime\mspace{6mu}\left( {\theta;\mspace{6mu} x_{2L - 2s + 1}} \right)P_{2s,2t},}}} & \text{­­­(130)} \end{matrix}$

$\begin{matrix} {B_{t} = {\sum\limits_{s = t + 2}^{L}{P_{2t + 2,2s - 2}\mspace{6mu} W\prime_{\mspace{6mu} 2s - 1}P_{2s,2L - 1}}}} & \text{­­­(131)} \end{matrix}$

$\begin{matrix} {= {\sum\limits_{s = t + 2}^{L}{P_{2t + 2,2s - 2}\mspace{6mu} U\prime\mspace{6mu}\left( {\theta;\mspace{6mu} x_{2L - 2s + 1}} \right)P_{2s,2L - 1}.}}} & \text{­­­(132)} \end{matrix}$

Then Eq. (128) yields

$\begin{matrix} \begin{array}{l} {Q\prime\mspace{6mu}\left( {\theta;\mspace{6mu}\overset{\rightarrow}{x}} \right) = A_{t}W_{2t + 1}P_{2t + 2,2L - 1}\mspace{6mu} + \mspace{6mu} P_{0,2t}W\prime_{\mspace{6mu} 2t + 1}P_{2t + 2,2L - 1}} \\ {+ \mspace{6mu} P_{0,2t}W_{2t + 1}B_{t}} \end{array} & \text{­­­(133)} \end{matrix}$

$\begin{matrix} \begin{array}{l} {= \mspace{6mu} A_{t}\left( {\cos\mspace{6mu}\left( x_{j} \right)\overline{I} - \text{i}\mspace{6mu}\text{sin}\left( x_{j} \right)P(\theta)} \right)P_{2t + 2,2L - 1}} \\ {- \text{i sin}\left( x_{j} \right)\mspace{6mu} P_{0,2}P\prime\mspace{6mu}(\theta)P_{2t + 2,2L - 1}} \\ {+ P_{0,2t}\left( {\cos\mspace{6mu}\left( x_{j} \right)\overline{I} - \text{i sin}\left( x_{j} \right)\mspace{6mu} P(\theta)} \right)B_{t}} \end{array} & \text{­­­(134)} \end{matrix}$

$\begin{matrix} \begin{array}{l} {= \mspace{6mu}\cos\mspace{6mu}\left( x_{j} \right)\mspace{6mu}\left( {A_{t}P_{2t + 2,2L - 1} + P_{0,2t}B_{t}} \right)} \\ {- \text{i sin}\left( x_{j} \right)\mspace{6mu}\left( {A_{t}P(\theta)P_{2t + 2,2L - 1} + P_{0,2t}P\prime\mspace{6mu}(\theta)P_{2t + 2,2L - 1} + P_{0,2t}P(\theta)B_{t}} \right),} \end{array} & \text{­­­(135)} \end{matrix}$

which leads to

$\begin{matrix} {\Lambda\prime\mspace{6mu}\left( {\theta;\mspace{6mu}\overset{\rightarrow}{x}} \right) = C\prime_{j}\left( {\theta;\mspace{6mu}{\overset{\rightarrow}{x}}_{\neg j}} \right)\mspace{6mu}\cos\mspace{6mu}\left( x_{j} \right)\mspace{6mu} + \mspace{6mu} S\prime_{j}\left( {\theta;\mspace{6mu}{\overset{\rightarrow}{x}}_{\neg j}} \right)\mspace{6mu}\sin\mspace{6mu}\left( x_{j} \right),} & \text{­­­(136)} \end{matrix}$

where

$\begin{matrix} {C\prime_{j}\left( {\theta;\mspace{6mu}{\overset{\rightarrow}{x}}_{\neg j}} \right) = {Re}\left( \left\langle {\overline{0}\left| \left( {A_{t}P_{2t + 2,2L - 1} + P_{0,2t}B_{t}} \right) \right|\overline{0}} \right\rangle \right),} & \text{­­­(137)} \end{matrix}$

$\begin{matrix} {S\prime_{j}\left( {\theta;\mspace{6mu}{\overset{\rightarrow}{x}}_{\neg j}} \right) = {Im}\left( \left\langle {\overline{0}\left| \left( {A_{t}P(\theta)P_{2t + 2,2L - 1} + P_{0,2t}P\prime(\theta)P_{2t + 2,2L - 1} + P_{0,2t}P(\theta)B_{t}} \right) \right|\overline{0}} \right\rangle \right).} & \text{­­­(138)} \end{matrix}$

Given θ and x _(¬j), we first compute the following matrices in a total of O(L) time by standard dynamic programming techniques:

-   P_(0,2 s-2) and P_(2s,2t) for s = 1,2, ..., t; -   P_(2t+2,2 s-2) and P_(2s,2L-1) for s = t + 2, t + 3, ..., L; -   P_(a,2t) and P_(2t+2,2L-1).

Then we compute A_(t) and B_(t) by Eqs. (130) and (132). After that, we calculate C′_(j)(θ;x _(¬j)) and S′_(j)(θ;x _(¬j)) by Eqs. (137) and (138). Overall, this procedure takes O(L) time.

W

A.1.2. Maximizing the Fisher Information of the Likelihood Function

We propose two algorithms for maximizing the Fisher information of the likelihood function ℙ(d|θ; ƒ, x) at a given point θ = µ (i.e. the prior mean of θ). Namely, our goal is to find x ∈ ℝ^(2L) that maximize

$\begin{matrix} {J\left( {\theta;\mspace{6mu} f,\mspace{6mu}\overline{x}} \right) = \frac{f^{2}\left( {\Lambda\prime\left( {\theta;\mspace{6mu}\overset{\rightarrow}{x}} \right)} \right)^{2}}{1 - f^{2}\left( {\Lambda\left( {\theta;\mspace{6mu}\overset{\rightarrow}{x}} \right)} \right)^{2}}.} & \text{­­­(139)} \end{matrix}$

These algorithms are similar to Algorithms 1 and 2 for Fisher information maximization in the ancilla-free case, in the sense that they are also based on gradient ascent and coordinate ascent, respectively. The main difference is that now we invoke the procedures in Lemma 2 to evaluate C(µ;x _(¬j)), S(µ;x _(¬j)), C′(µ;x _(¬j)) and S′(µ;x _(¬j)) for given µ and x _(¬j), and then use them to either compute the partial derivative of ℑ(µ;ƒ, x) with respect to x_(j) (in gradient ascent) or define a single-variable optimization problem for x_(j) (in coordinate ascent). These algorithms are formally described in Algorithms 5 and 6.

A.1.3. Maximizing the Slope of the Likelihood Function

We also propose two algorithms for maximizing the slope of the likelihood function ℙ(d|θ; ƒ, x) at a given point θ = µ (i.e. the prior mean of θ). Namely, our goal is to find x ∈ ℝ^(2L) that maximize |ℙ′(d|θ;ƒ, x)| = ƒ|Λ′(θ;x)|/2.

These algorithms are similar to Algorithms 3 and 4 for slope maximization in the ancilla-free case, in the sense that they are also based on gradient ascent and coordinate ascent, respectively. The main difference is that now we invoke the procedures in Lemma 2 to evaluate C′(µ;x _(¬j)) and S′(µ;x _(¬j)) for given µ and x _(¬j). Then we use these quantities to either compute the partial derivative of (Λ(µ;x))² with respect to x_(j) (in gradient ascent) or directly update the value of x_(j) (in coordinate ascent). These algorithms are formally described in Algorithms 7 and 8.

A.2. Approximate Bayesian Inference With Engineered Likelihood Functions

With the algorithms for tuning the circuit parameters x in place, we now briefly describe how to perform Bayesian inference efficiently with the resultant likelihood functions. The idea is similar to the one in Section 4.2 for the ancilla-free scheme.

Suppose θ has prior distribution N(µ,σ²), where σ « ⅟L, and the fidelity of the process for generating the ELF is ƒ. We find that the parameters x = (x₁, x₂, ..., x_(2L)) that maximize (µ;ƒ,x) (or |Λ′(µ;x)|) satisfy the following property: When θ is close to µ, i.e. θ ∈ [µ -0(σ),µ + 0(σ)], we have

$\begin{matrix} {{\mathbb{P}}\left( {d\left| {\theta;\mspace{6mu} f,\mspace{6mu}\overset{\rightarrow}{x}} \right)} \right) \approx \frac{1 + \left( {- 1} \right)^{d}f\mspace{6mu} sin\mspace{6mu}\left( {r\theta + b} \right)}{2}} & \text{­­­(147)} \end{matrix}$

for some r, b ∈ ℝ. We find the best-fitting r and b by solving the following least squares problem:

$\begin{matrix} {\left( {r*,b*} \right) = \underset{r,b}{\arg\mspace{6mu}\min}{\sum\limits_{\theta \in \Theta}{\left| {\arcsin\mspace{6mu}\left( {\Lambda\left( {\theta;\mspace{6mu}\overset{\rightarrow}{x}} \right)} \right) - r\theta - b} \right|^{2},}}} & \text{­­­(148)} \end{matrix}$

where Θ = {θ₁, θ₂, ..., θ_(k)} ⊆ [µ - 0(σ), µ + 0(σ)]. This least-squares problem has the following analytical solution:

$\begin{matrix} {\left( \begin{array}{l} {r*} \\ {b*} \end{array} \right) = A^{+}z = \left( {A^{T}A} \right)^{- 1}A^{T}z,} & \text{­­­(149)} \end{matrix}$

where

$\begin{matrix} {A = \begin{pmatrix} \theta_{1} & 1 \\ \theta_{2} & 1 \\  \vdots & \vdots \\ \theta_{k} & 1 \end{pmatrix},\mspace{6mu} z = \begin{pmatrix} {\arcsin\mspace{6mu}\left( {\Lambda\left( {\theta_{1};\mspace{6mu}\overset{\rightarrow}{x}} \right)} \right)} \\ {\arcsin\mspace{6mu}\left( {\Lambda\left( {\theta_{2};\mspace{6mu}\overset{\rightarrow}{x}} \right)} \right)} \\  \vdots \\ {\arcsin\mspace{6mu}\left( {\Lambda\left( {\theta_{k};\mspace{6mu}\overset{\rightarrow}{x}} \right)} \right)} \end{pmatrix}.} & \text{­­­(156)} \end{matrix}$

FIG. 34 illustrates an example of the true and fitted likelihood functions.

Once we obtain the optimum r and b, we approximate the posterior mean and variance of θ with the ones for

$\begin{matrix} {{\mathbb{P}}\left( {d\left| {\theta;\mspace{6mu} f} \right)} \right) = \frac{1 + \left( {- 1} \right)^{d}f\mspace{6mu} sin\mspace{6mu}\left( {r\theta + b} \right)}{2},} & \text{­­­(157)} \end{matrix}$

which have analytical formulas. Specifically, suppose θ has prior distribution

N(μ_(k), σ_(k)²)

at round k. Let d_(k) be the measurement outcome and (r_(k), b_(k)) be the best-fitting parameters at this round. Then we approximate the posterior mean and variance of θ by

$\begin{matrix} {\mu_{k + 1} = \mu_{k} + \frac{\left( {- 1} \right)^{d_{k}}f\mspace{6mu} e^{- r_{k}^{2}{\sigma_{k}^{2}/2}}r_{k}\sigma_{k}^{2}\mspace{6mu} cos\mspace{6mu}\left( {r_{k}\mu_{k} + b_{k}} \right)}{1 + \left( {- 1} \right)^{d_{k}}f\mspace{6mu} e^{- r_{k}^{2}{\sigma_{k}^{2}/2}}\mspace{6mu} sin\mspace{6mu}\left( {r_{k}\mu_{k} + b_{k}} \right)},} & \text{­­­(158)} \end{matrix}$

$\begin{matrix} \begin{array}{l} \sigma_{k + 1}^{2} \\ {= \sigma_{k}^{2}(1)} \\ {- \frac{fr_{k}^{2}\sigma_{k}^{2}e^{- r_{k}^{2}{\sigma_{k}^{2}/2}}\left\lbrack {f\mspace{6mu} e^{- r_{k}^{2}{\sigma_{k}^{2}/2}} + \left( {- 1} \right)^{d_{k}}\mspace{6mu} sin\mspace{6mu}\left( {r_{k}\mu_{k} + b_{k}} \right)} \right\rbrack}{\left\lbrack {1 + \left( {- 1} \right)^{d_{k}}f e^{- r_{k}^{2}{\sigma_{k}^{2}/2}}\mspace{6mu} sin\mspace{6mu}\left( {r_{k}\mu_{k} + b_{k}} \right)} \right\rbrack^{2}}.} \end{array} & \text{­­­(159)} \end{matrix}$

After that, we proceed to the next round, setting

N(μ_(k + 1), σ_(k + 1)²)

as the prior distribution of θ for that round. The approximation errors incurred by Eqs. (158) and (159) are small and have negligible impact on the performance of the whole algorithm for the same reason as in the ancilla-free case.

C. Proof of Lemma

For convenience, we introduce the following notation. Let W_(2i) = U^(†)(θ;x_(2i+1)) = U(θ; —x_(2i+1)), W_(2i+1) = V^(†)(x_(2i+2)) = V(—x_(2i+2)), W_(4L-2i) = U(θ; x_(2i+1)), and W_(4L-2i-1) = V(x_(2i+2)), for i = 0,1, ..., L — 1, and W_(2L) = P(θ). Furthermore, let W′_(j) = ∂_(θ)W_(j) for j = 0,1, ...,4L. Note that W′_(j) = 0 if j is odd. Then we define P_(a,b) = W_(a)W_(a+1) ... W_(b) if 0 ≤ a ≤ b ≤ 4L, and P_(a,b) = I otherwise.

With this notation,

$Q\left( {\theta;\overset{\rightarrow}{x}} \right)^{\dagger} = P_{0,a - 1}W_{a}P_{a + 1,2L - 1},\forall 0 \leq a \leq 2L - 1,$

$Q\left( {\theta;\overset{\rightarrow}{x}} \right) = P_{2L + 1,b - 1}W_{b}P_{b + 1,4L},\forall 2L + 1 \leq b \leq 4L,$

$\begin{array}{l} {Q\left( {\theta;\overset{\rightarrow}{x}} \right)^{\dagger}P(\theta)Q\left( {\theta;\overset{\rightarrow}{x}} \right) = P_{0,a - 1}W_{a}P_{a + 1,b - 1}W_{b}P_{b + 1,4L},\forall 0 \leq a} \\ {< b \leq 4L.} \end{array}$

Moreover, taking the derivative with respect to θ yields

$\begin{matrix} {Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right) = \partial_{\theta}Q\left( {\theta;\overset{\rightarrow}{x}} \right)} \\ {= V\left( x_{2L} \right)U^{\prime}\left( {\theta;x_{2L - 1}} \right)V\left( x_{2L - 2} \right)U\left( {\theta;x_{2L - 3}} \right)\ldots V\left( x_{4} \right)U\left( {\theta;x_{3}} \right)V\left( x_{2} \right)U\left( {\theta;x_{1}} \right)} \\ {+ V\left( x_{2L} \right)U\left( {\theta;x_{2L - 1}} \right)V\left( x_{2L - 2} \right)U^{\prime}\left( {\theta;x_{2L - 3}} \right)\ldots V\left( x_{4} \right)U\left( {\theta;x_{3}} \right)V\left( x_{2} \right)U\left( {\theta;x_{1}} \right)} \\ {+ \cdots} \\ {+ V\left( x_{2L} \right)U\left( {\theta;x_{2L - 1}} \right)V\left( x_{2L - 2} \right)U\left( {\theta;x_{2L - 3}} \right)\ldots V\left( x_{4} \right)U^{\prime}\left( {\theta;x_{3}} \right)V\left( x_{2} \right)U\left( {\theta;x_{1}} \right)} \\ {+ V\left( x_{2L} \right)U\left( {\theta;x_{2L - 1}} \right)V\left( x_{2L - 2} \right)U\left( {\theta;x_{2L - 3}} \right)\ldots V\left( x_{4} \right)U\left( {\theta;x_{3}} \right)V\left( x_{2} \right)U^{\prime}\left( {\theta;x_{1}} \right),} \end{matrix}$

where

$\begin{array}{l} {U^{\prime}\left( {\theta;\alpha} \right) = \partial_{\theta}U\left( {\theta;\alpha} \right) = - \text{i sin}(\alpha)P^{\prime}(\theta)} \\ {\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, = \text{i sin}(\alpha)\left( {\sin(\theta)\overline{Z} - \cos(\theta)\overline{X}} \right)} \end{array}$

is the derivative of U(θ; α) with respect to θ, in which

$P^{\prime}(\theta) = - \sin(\theta)\overline{Z} + \cos(\theta)\overline{X}$

is the derivative of P(θ) with respect to θ. It follows that

$\begin{matrix} {Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right) = P_{2L + 1,2L + 1}{W^{\prime}}_{\,\,\, 2L + 2}P_{2L + 3,4L}} \\ {+ P_{2L + 1,2L + 3}{W^{\prime}}_{\,\,\, 2L + 4}P_{2L + 5,4L}} \\ {+ \ldots} \\ {+ P_{2L + 1,4L - 3}{W^{\prime}}_{\,\,\, 4L - 2}P_{4L - 1,4L}} \\ {+ P_{2L + 1,4L - 1}{W^{\prime}}_{\,\, 4L}.} \end{matrix}$

The following facts will be useful. Suppose A, B and C are arbitrary linear operators on the Hilbert space ℌ = span{|0〉, |1〉}. Then by direct calculation, one can verify that

$\,\begin{matrix} \left\langle {\overline{0}\left| {AV\left( {- x} \right)BV(x)C} \right|\overline{0}} \right\rangle \\ {= \left\langle {\overline{0}\left| {A\left\lbrack {\cos(x)\overline{I} + \text{i sin}(x)\overline{Z}} \right\rbrack B\left\lbrack {\cos(x)\overline{I}} \right)} \right)} \right)} \\ {- \text{i sin}(x)\left( \overline{Z} \right\rbrack C\left| \left( \overline{0} \right\rangle \right)} \\ {= \frac{1}{2}\left\lbrack {\cos\left( {2x} \right)\left\langle {\overline{0}\left| {A\left( {B - \overline{Z}B\overline{Z}} \right)C} \right|\overline{0}} \right\rangle} \right)} \\ {- \text{i sin}\left( {2x} \right)\left\langle {\overline{0}\left| {A\left( {B\overline{Z} - \overline{Z}B} \right)C} \right|\overline{0}} \right\rangle} \\ {+ \left\langle {\overline{0}\left| {A\left( {B + \overline{Z}B\overline{Z}} \right)C} \right|\overline{0}} \right\rangle,} \end{matrix}$

$\begin{matrix} \left\langle {\overline{0}\left| {AU\left( {\theta; - x} \right)BU\left( {\theta;x} \right)C} \right|\overline{0}} \right\rangle \\ {= \left\langle {\overline{0}\left| {A\left\lbrack {\cos(x)\overline{I} + \text{i sin}(x)P(\theta)} \right\rbrack B\left\lbrack {\cos(x)\overline{I}} \right)} \right)} \right)} \\ {- \text{i sin}(x)P\left( (\theta) \right\rbrack C\left( \left| \overline{0} \right) \right\rangle} \\ {= \frac{1}{2}\left\lbrack {\cos\left( {2x} \right)\left\langle {\overline{0}\left| {A\left( {B - P(\theta)BP(\theta)} \right)C} \right|\overline{0}} \right\rangle} \right\rbrack} \\ {- \text{i sin}\left( {2x} \right)\left\langle {\overline{0}\left| {A\left( {BP(\theta) - P(\theta)B} \right)C} \right|\overline{0}} \right\rangle} \\ {+ \left\langle {\overline{0}\left| {A\left( {B + P(\theta)BP(\theta)} \right)C} \right|\overline{0}} \right\rangle,} \end{matrix}$

and

$\begin{matrix} \left\langle {\overline{0}\left| {AU\left( {\theta; - x} \right)BU^{\prime}\left( {\theta;x} \right)C} \right|\overline{0}} \right\rangle \\ {= \left\langle {\overline{0}\left| {A\left\lbrack {\cos(x)\overline{I}} \right)} \right)} \right)} \\ {\text{+i sin}(x)P\left( (\theta) \right\rbrack B\left\lbrack {- \text{i sin}(x)P^{\prime}(\theta)} \right\rbrack C\left| \left( \overline{0} \right\rangle \right)} \\ {= \frac{1}{2}\left\lbrack {- \cos\left( {2x} \right)\left\langle {\overline{0}\left| {AP(\theta)BP^{\prime}(\theta)C} \right|\overline{0}} \right\rangle} \right\rbrack} \\ {- \text{i sin}\left( {2x} \right)\left\langle {\overline{0}\left| {ABP^{\prime}(\theta)C} \right|\overline{0}} \right\rangle} \\ {+ \left\langle {\overline{0}\left| {AP(\theta)BP^{\prime}(\theta)C} \right|\overline{0}} \right\rangle.} \end{matrix}$

The following fact will be also useful. Taking the derivative with respect to θ yields

$\begin{matrix} {\Delta^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right) = \left\langle {\overline{0}\left| {Q^{\dagger}\left( {\theta;\overset{\rightarrow}{x}} \right)P(\theta)Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} \right|\overline{0}} \right\rangle} \\ {+ \left\langle {\overline{0}\left| {Q^{\dagger}\left( {\theta;\overset{\rightarrow}{x}} \right)P^{\prime}(\theta)Q\left( {\theta;\overset{\rightarrow}{x}} \right)} \right|\overline{0}} \right\rangle} \\ {+ \left\langle \overline{0} \right)\left| {\left( {Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)^{\dagger}P(\theta)Q\left( {\theta;\overset{\rightarrow}{x}} \right)} \right|\left( \overline{0} \right\rangle} \\ {= 2\,\text{Re}\left( {\left\langle \overline{0} \right)\left| {Q^{\dagger}\left( {\theta;\overset{\rightarrow}{x}} \right)P(\theta)Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} \right|\left( \overline{0} \right\rangle} \right)} \\ {+ \left\langle {\left( \overline{0} \right|\left( {Q^{\dagger}\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)P^{\prime}(\theta)Q\left( {\theta;\overset{\rightarrow}{x}} \right)\left| \overline{0} \right)} \right\rangle.} \end{matrix}$

In order to evaluate C_(j)(θ;x _(¬j)), S_(j)(θ;x _(¬j)), B_(j)(θ;x _(¬j)), C′_(j)(θ;x _(¬j)), S′_(j)(θ;x _(¬j)) and B′_(j)(θ;x _(¬j)) for given θ and x _(¬j), we consider the case j is even and the case j is odd separately.

-   ● Case 1: j = 2(t + 1) is even, where 0 ≤ t ≤ L — 1. In this case,     W_(2t+1) = V(-x_(j)), and W_(4L-2t-1) = V(x_(j)). Then we obtain -   $\begin{array}{l}     {\Delta\left( {\theta;\overset{\rightarrow}{x}} \right) = \left\langle {\overline{0}\left| {P_{0,2t}V\left( {- x_{j}} \right)P_{2t + 2,4L - 2t - 2}V\left( x_{j} \right)P_{4L - 2t,4L}} \right|\overline{0}} \right\rangle} \\     {\,\,\,\,\,\,\,\, = C_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) + S_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\sin\left( {2x_{j}} \right) + B_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right),}     \end{array}$ -   ● where -   $\begin{array}{l}     {C_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) = \frac{1}{2}\left\langle {\overline{0}\left| {P_{0,2t}\left( P_{2t + 2,4L - 2t - 2} \right)} \right)} \right)} \\     {\, - \overline{Z}P_{2t + 2,4L - 2t - 2}\left( \overline{Z} \right)\left( P_{4L - 2t,4L} \right|\left( \overline{0} \right\rangle,}     \end{array}$ -   $\begin{array}{l}     {S_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) = - \frac{i}{2}\left\langle {\overline{0}\left| {P_{0,2t}\left( P_{2t + 2,4L - 2t - 2} \right)\overline{Z}} \right)} \right)} \\     {- Z\left( P_{2t + 2,4L - 2t - 2} \right)\left( P_{4L - 2t,4L} \right|\left( \overline{0} \right\rangle,}     \end{array}$ -   $\begin{array}{l}     {B_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) = - \frac{1}{2}\left\langle {\overline{0}\left| {P_{0,2t}\left( P_{2t + 2,4L - 2t - 2} \right)} \right)} \right)} \\     {+ \overline{Z}\left( {P_{2t + 2,4L - 2t - 2}\overline{Z}} \right)\left( P_{4L - 2t,4L} \right|\left( \overline{0} \right\rangle.}     \end{array}$ -   ● Given θ and x _(¬j), we first compute P_(0,2t), P_(2t+2,4L-2t-2)     and P_(4L-2t,4L) in O(L) time. Then we calculate C_(j)(θ;x _(¬j)),     S_(j)(θ;x _(¬j)) and B_(j)(θ;x _(¬j)). This procedure takes only     O(L) time.

Next, we show how to compute C′_(j)(θ;x _(¬j)), S′_(j)(θ;x _(¬j)) and B′_(j)(θ;x _(¬j)). Using the above and the fact P_(a,b) = P_(a,4L-2t-2)W_(4L-2t-1)P_(4L-2t,4L) for any α ≤ 4L - 2t - 1 ≤ b, we obtain

$\begin{matrix} {Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right) = P_{2L + 1,2L + 1}{W^{\prime}}_{2L + 2}P_{2L + 3,4L - 2t - 2}W_{4L - 2t - 1}P_{4L - 2t,4L}} \\ {+ P_{2L + 1,2L + 3}{W^{\prime}}_{2L + 4}P_{2L + 5,4L - 2t - 2}W_{4L - 2t - 1}P_{4L - 2t,4L}} \\ {+ \cdots} \\ {+ P_{2L + 1,4L - 2t - 3}{W^{\prime}}_{4L - 2t - 2}W_{4L - 2t - 1}P_{4L - 2t,4L}} \\ {+ P_{2L + 1,4L - 2t - 2}W_{4L - 2t - 1}{W^{\prime}}_{4L - 2t}P_{4L - 2 + 1,4L}} \\ {+ \cdots} \\ {+ P_{2L + 1,4L - 2t - 2}W_{4L - 2t - 1}P_{4L - 2t,4L - 3}{W^{\prime}}_{4L - 2}P_{4L - 1,4L}} \\ {+ P_{2L + 1,4L - 2t - 2}W_{4L - 2t - 1}P_{4L - 2t,4L - 1}{W^{\prime}}_{4L}.} \end{matrix}$

Then it follows that

$\begin{array}{l} {Q\left( {\theta;\overset{\rightarrow}{x}} \right)^{\dagger}P(\theta)Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} \\ {= A_{t}^{(1)}W_{2t + 1}B_{t}^{(1)}W_{4L - 2t - 1}C_{t}^{(1)}} \\ {+ A_{t}^{(2)}W_{2t + 1}B_{t}^{(2)}W_{4L - 2t - 1}C_{t}^{(2)},} \\ {= A_{t}^{(1)}V\left( {- x_{j}} \right)B_{t}^{(1)}V\left( x_{j} \right)C_{t}^{(1)} + A_{t}^{(2)}V\left( {- x_{j}} \right)B_{t}^{(2)}V\left( x_{j} \right)C_{t}^{(2)},} \end{array}$

where

A_(t)⁽¹⁾ = P_(0, 2t),

B_(t)⁽¹⁾ = P_(2t + 2, 4L − 2t − 2),

$\begin{matrix} {C_{t}^{(1)} = {\sum\limits_{k = 0}^{t}{P_{4L - 2t,4L - 2k - 1}{W^{\prime}}_{4L - 2k}P_{4L - 2k + 1,4L}}}} \\ {= {\sum\limits_{k = 0}^{t}{P_{4L - 2t,4L - 2k - 1}U^{\prime}\left( {\theta;x_{2k + 1}} \right)P_{4L - 2k + 1,4L},}}} \end{matrix}$

A_(t)⁽²⁾ = P_(0, 2t),

$\begin{matrix} {B_{t}^{(2)} = {\sum\limits_{k = t + 1}^{L - 1}{P_{2t + 2,4L - 2k - 1}{W^{\prime}}_{4L - 2k}P_{4L - 2k + 1,4L - 2t - 2}}}} \\ {= {\sum\limits_{k = t + 1}^{L - 1}{P_{2t + 2,4L - 2k - 1}U^{\prime}\left( {\theta;x_{2k + 1}} \right)P_{4L - 2k + 1,4L - 2t - 2},}}} \end{matrix}$

C_(t)⁽²⁾ = P_(4L − 2t, 4L).

Meanwhile, we have

$\begin{matrix} {Q\left( {\theta;\overset{\rightarrow}{x}} \right)^{\dagger}P^{\prime}(\theta)Q\left( {\theta;\overset{\rightarrow}{x}} \right) = A_{t}^{(3)}W_{2t + 1}B_{t}^{(3)}W_{4L - 2t - 1}C_{t}^{(3)}} \\ {= A_{t}^{(3)}V\left( {- x_{j}} \right)B_{t}^{(3)}V\left( x_{j} \right)C_{t}^{(3)},} \end{matrix}$

where

$\begin{matrix} {A_{t}^{(3)} = P_{0,2t},} \\ {B_{t}^{(3)} = P_{2t + 2,2L - 1}P^{\prime}(\theta)P{}_{2L + 1,4L - 2t - 2},} \\ {C_{t}^{(3)} = P_{4L - 2t,4L}.} \end{matrix}$

Combining the above facts yields

$\begin{matrix} {\Delta\left( {\theta;\overset{\rightarrow}{x}} \right) = {C^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\cos\left( {2x_{j}} \right) + {S^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\sin\left( {2x_{j}} \right)} \\ {+ {B^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right),} \end{matrix}$

where

$\begin{matrix} {{C^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) = \text{Re}\left( \left\langle {\overline{0}\left| {A_{t}^{(1)}\left( {B_{t}^{(1)} - \overline{Z}B_{t}^{(1)}\overline{Z}} \right)C_{t}^{(1)}} \right|\overline{0}} \right\rangle \right)} \\ {+ \text{Re}\left( \left\langle {\overline{0}\left| {A_{t}^{(2)}\left( {B_{t}^{(2)} - \overline{Z}B_{t}^{(2)}\overline{Z}} \right)C_{t}^{(2)}} \right|\overline{0}} \right\rangle \right)} \\ {+ \frac{1}{2}\left\langle {\overline{0}\left| {A_{t}^{(3)}\left( {B_{t}^{(3)} - \overline{Z}B_{t}^{(3)}\overline{Z}} \right)C_{t}^{(3)}} \right|\overline{0}} \right\rangle,} \end{matrix}$

$\begin{matrix} {{S^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) = \text{Im}\left( \left\langle {\overline{0}\left| {A_{t}^{(1)}\left( {B_{t}^{(1)} - \overline{Z}B_{t}^{(1)}} \right)C_{t}^{(1)}} \right|\overline{0}} \right\rangle \right)} \\ {+ \text{Im}\left( \left\langle {\overline{0}\left| {A_{t}^{(2)}\left( {B_{t}^{(2)}\overline{Z} - \overline{Z}B_{t}^{(2)}} \right)C_{t}^{(2)}} \right|\overline{0}} \right\rangle \right)} \\ {- \frac{i}{2}\left\langle {\overline{0}\left| {A_{t}^{(3)}\left( {B_{t}^{(3)}\overline{Z} - \overline{Z}B_{t}^{(3)}} \right)C_{t}^{(3)}} \right|\overline{0}} \right\rangle} \end{matrix}$

$\begin{matrix} {{B^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) = \text{Re}\left( \left\langle {\overline{0}\left| {A_{t}^{(1)}\left( {B_{t}^{(1)} + \overline{Z}B_{t}^{(1)}\overline{Z}} \right)C_{t}^{(1)}} \right|\overline{0}} \right\rangle \right)} \\ {+ \text{Re}\left( \left\langle {\overline{0}\left| {A_{t}^{(2)}\left( {B_{t}^{(2)} + \overline{Z}B_{t}^{(2)}\overline{Z}} \right)C_{t}^{(2)}} \right|\overline{0}} \right\rangle \right)} \\ {+ \frac{1}{2}\left\langle {\overline{0}\left| {A_{t}^{(3)}\left( {B_{t}^{(3)} + \overline{Z}B_{t}^{(3)}\overline{Z}} \right)C_{t}^{(3)}} \right|\overline{0}} \right\rangle.} \end{matrix}$

Given 0 and

${\overset{\rightarrow}{x}}_{\neg j},$

we first compute the following matrices in a total of 0(L) time by standard dynamic programming technique:

-   P_(0,2t), P_(2t+2,4L-2t-2), P_(4L-2t,4L), P_(2t+2,2L-1),     P_(2L+1,4L-2t-2); -   P_(4L-2t),_(4L-2k-1) and P_(4L-2k+1,4L) for k = 0,1, ..., t; -   P_(2t+2,4L-2k-1) and P_(4L-2k+1,4L-2t-2) for k = t + 1, t + 2, ...,     L — 1.

Then we compute

A_(t)^((i)), B_(t)^((i)) andC_(t)^((i)) fori = 1, 2, 3.

After that, we calculate C′_(j)(θ; x _(¬j)), S′_(j)(θ;x _(egj)) and B′_(j)(θ;x _(¬j)). Overall, this procedure takes 0(L) time.

-   5. Case 2: j = 2t + 1 is odd, where 0 ≤ t ≤ L — 1. In this case,     W_(2t) = U(θ;-_(Xj)), and W_(4L-2t) = U(θ;x_(j)). They we get -   $\begin{array}{l}     {\Delta\left( {\theta;\overset{\rightarrow}{x}} \right)} \\     {= \left\langle {\overline{0}\left| {P_{0,2t - 1}U\left( {\theta; - x_{j}} \right)P_{2t + 2,4L - 2t - 1}U\left( {\theta;x_{j}} \right)P_{4L - 2t + 1,4L}} \right|\overline{0}} \right\rangle} \\     {= C_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) + S_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\sin\left( {2x_{j}} \right) + B_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right),}     \end{array}$ -   6. where -   $\begin{array}{l}     {C_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) = \frac{1}{2}\left\langle {\overline{0}\left| {P_{0,2t - 1}\left( P_{2t + 1,4L - 2t - 2} \right)} \right)} \right)} \\     {- P(\theta)P_{2t + 1,4L - 2t - 1}P\left( (\theta) \right)P_{4L - 2t + 1,4L}\left| {\left( \overline{0} \right\rangle.} \right)}     \end{array}$ -   $\begin{array}{l}     {S_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) = - \frac{i}{2}\left\langle {\overline{0}\left| {P_{0,2t - 1}\left( {P_{2t + 1,4L - 2t - 1}P(\theta)} \right)} \right)} \right)} \\     {- \left( {P(\theta)P_{2t + 1,4L - 2t - 1}} \right)P_{4L - 2t + 1,4L}\left| {\left( \overline{0} \right\rangle,} \right)}     \end{array}$ -   $\begin{array}{l}     {B_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) = \frac{1}{2}\left\langle {\overline{0}\left| {P_{0,2t - 1}\left( P_{2t + 1,4L - 2t - 1} \right)} \right)} \right)} \\     {+ \left( {P(\theta)P_{2t + 1,4L - 2t - 1}P(\theta)} \right)P_{4L - 2t + 1,4L}\left( \left| \overline{0} \right) \right\rangle.}     \end{array}$ -   7. Given θ and -   ${\overset{\rightarrow}{x}}_{\neg j},$ -   we first compute P₀,_(2t-1,) P_(2t+1),_(4L-2t-1) and     P_(4L-2t+1),_(4L) in 0(L)

time. Then we calculate C_(j)(θ;

$\left( {\overset{\rightarrow}{x}}_{\neg j} \right),$

$S_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)$

and

$B_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}.} \right)$

This procedure takes only 0(L) time.

Next, we describe how to compute

${C^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right),{S^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\mspace{6mu}\text{and}{B^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right).$

Using the above and the fact P_(a,b) = P_(a,4L-2t-1)W_(4L-2t)P_(4L-2t+1,4L) for any a ≤ 4L - 2t ≤ b, we obtain

$\begin{matrix} {Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right) = P_{2L + 1,2L + 1}{W^{\prime}}_{2L + 2}P_{2L + 3,4L - 2t - 1}W_{4L - 2t}P_{4L - 2t + 1,4L}} \\ {+ P_{2L + 1,2L + 3}{W^{\prime}}_{2L + 4}P_{2L + 5,4L - 2t - 1}W_{4L - 2t}P_{4L - 2t + 1,4L}} \\ {+ \cdots} \\ {+ P_{2L + 1,4L - 2t - 3}{W^{\prime}}_{4L - 2t - 2}P_{4L - 2t - 1,4L - 2t - 1}W_{4L - 2t}P_{4L - 2t + 1,4L}} \\ {+ P_{2L + 1,4L - 2t - 1}{W^{\prime}}_{4L - 2t}P_{4L - 2t + 1,4L}} \\ {+ P_{2L + 1,4L - 2t - 1}W_{4L - 2t}P_{4L - 2t + 1,4L - 2t + 1}{W^{\prime}}_{4L - 2t + 2}P_{4L - 2t + 3,4L}} \\ {+ \cdots} \\ {+ P_{2L + 1,4L - 2t - 1}W_{4L - 2t}P_{4L - 2t + 1,4L - 3}{W^{\prime}}_{4L - 2}P_{4L - 1,4L}} \\ {+ P_{2L + 1,4L - 2t - 1}W_{4L - 2t}P_{4L - 2t + 1,4L - 1}{W^{\prime}}_{4L}.} \end{matrix}$

Then it follows that

$\begin{matrix} {Q\left( {\theta;\overset{\rightarrow}{x}} \right)^{\dagger}P(\theta)Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right) = A_{t}^{(1)}W_{2t}B_{t}^{(1)}W_{4L - 2t}C_{t}^{(1)}} \\ {+ A_{t}^{(2)}W_{2t}B_{t}^{(2)}{W^{\prime}}_{4L - 2t}C_{t}^{(2)}} \\ {+ A_{t}^{(3)}W_{2t}B_{t}^{(3)}W_{4L - 2t}C_{t}^{(3)},} \\ {= A_{t}^{(1)}U\left( {\theta; - x_{j}} \right)B_{t}^{(1)}U\left( {\theta;x_{j}} \right)C_{t}^{(1)}} \\ {+ A_{t}^{(2)}U\left( {\theta; - x_{j}} \right)B_{t}^{(2)}U^{\prime}\left( {\theta;x_{j}} \right)C_{t}^{(2)}} \\ {+ A_{t}^{(3)}U\left( {\theta; - x_{j}} \right)B_{t}^{(3)}U\left( {\theta;x_{j}} \right)C_{t}^{(3)},} \end{matrix}$

where

A_(t)⁽¹⁾ = P_(0, 2t − 1),

B_(t)⁽¹⁾ = P_(2t + 1, 4L − 2t − 1),

$\begin{array}{l} {C_{t}^{(1)} = {\sum\limits_{k = 0}^{t - 1}{P_{4L - 2t + 1,4L - 2k - 1}{W^{\prime}}_{4L - 2k}P_{4L - 2k + 1,4L}}}} \\ {= {\sum\limits_{k = 0}^{t - 1}{P_{4L - 2t + 1,4L - 2k - 1}U^{\prime}\left( {\theta;x_{2k + 1}} \right)P_{4L - 2k + 1,4L},}}} \end{array}$

A_(t)⁽²⁾ = P_(0, 2t − 1),

B_(t)⁽²⁾ = P_(2t + 1, 4L − 2t − 1)

C_(t)⁽²⁾ = P_(4L − 2t + 1, 4L),

A_(t)⁽³⁾ = P_(0, 2t − 1),

$\begin{array}{l} {B_{t}^{(3)} = {\sum\limits_{k = t + 1}^{L - 1}{P_{2t + 1,4L - 2k - 1}{W^{\prime}}_{4L - 2k}P_{4L - 2k + 1,4L - 2t - 1}}}} \\ {= {\sum\limits_{k = t + 1}^{L - 1}{P_{2t + 1,4L - 2k - 1}U^{\prime}\left( {\theta;x_{2k + 1}} \right)P_{4L - 2k + 1,4L - 2t - 1},}}} \end{array}$

C_(t)⁽³⁾ = P_(4L − 2t + 1, 4L).

Meanwhile, we have

$\begin{matrix} {Q\left( {\theta;\overset{\rightarrow}{x}} \right)^{\dagger}P^{\prime}(\theta)Q\left( {\theta;\overset{\rightarrow}{x}} \right) = A_{t}^{(4)}W_{2t}B_{t}^{(4)}W_{4L - 2t}C_{t}^{(4)}} \\ {= A_{t}^{(4)}U\left( {\theta; - x_{j}} \right)B_{t}^{(4)}U\left( {\theta;x_{j}} \right)C_{t}^{(4)},} \end{matrix}$

where

A_(t)⁽⁴⁾ = P_(0, 2t − 1),

B_(t)⁽⁴⁾ = P_(2t + 1, 2L − 1)P^(′)(θ)P_(2L + 1, 4L − 2t − 1),

C_(t)⁽⁴⁾ = P_(4L − 2t + 1, 4L).

Combining the above facts yields

$\begin{array}{l} {\Delta^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right) = {C^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\cos\left( {2x_{j}} \right) + {S^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)\sin\left( {2x_{j}} \right)} \\ {+ {B^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right),} \end{array}$

where

$\begin{matrix} {{C^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) = {Re}\left( \left\langle {\overline{0}\left| {A_{t}^{(1)}\left( {B_{t}^{(1)} - P(\theta)B_{t}^{(1)}P(\theta)} \right)C_{t}^{(1)}} \right|\overline{0}} \right\rangle \right)} \\ {- {Re}\left( \left\langle {\overline{0}\left| {A_{t}^{(2)}P(\theta)B_{t}^{(2)}P^{\prime}(\theta)C_{t}^{(2)}} \right|\overline{0}} \right\rangle \right)} \\ {+ {Re}\left( \left\langle {\overline{0}\left| {A_{t}^{(3)}\left( {B_{t}^{(3)} - P(\theta)B_{t}^{(3)}P(\theta)} \right)C_{t}^{(3)}} \right|\overline{0}} \right\rangle \right)} \\ {+ \frac{1}{2}\left\langle {\overline{0}\left| {A_{t}^{(4)}\left( {B_{t}^{(4)} - P(\theta)B_{t}^{(4)}P(\theta)} \right)C_{t}^{(4)}} \right|\overline{0}} \right\rangle,} \\ {{S^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) = {Im}\left( \left\langle {\overline{0}\left| {A_{t}^{(1)}\left( {B_{t}^{(1)}P(\theta) - P(\theta)B_{t}^{(1)}} \right)C_{t}^{(1)}} \right|\overline{0}} \right\rangle \right)} \\ {+ {Im}\left( \left\langle {\overline{0}\left| {A_{t}^{(2)}B_{t}^{(2)}P^{\prime}(\theta)C_{t}^{(2)}} \right|\overline{0}} \right\rangle \right)} \\ {+ {Im}\left( \left\langle {\overline{0}\left| {A_{t}^{(3)}\left( {B_{t}^{(1)}P(\theta) - P(\theta)B_{t}^{(3)}} \right)C_{t}^{(3)}} \right|\overline{0}} \right\rangle \right)} \\ {- \frac{i}{2}\left\langle {\overline{0}\left| \left\lbrack {A_{t}^{(4)}\left( {B_{t}^{(4)}P(\theta) - P(\theta)B_{t}^{(4)}} \right)C_{t}^{(4)}} \right\rbrack \right|\overline{0}} \right\rangle} \\ {{B^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right) = {Re}\left( \left\langle {\overline{0}\left| {A_{t}^{(1)}\left( {B_{t}^{(1)} + P(\theta)B_{t}^{(1)}P(\theta)} \right)C_{t}^{(1)}} \right|\overline{0}} \right\rangle \right)} \\ {+ {Re}\left( \left\langle {\overline{0}\left| {A_{t}^{(2)}P(\theta)B_{t}^{(2)}P^{\prime}(\theta)C_{t}^{(2)}} \right|\overline{0}} \right\rangle \right)} \\ {+ {Re}\left( \left\langle {\overline{0}\left| {A_{t}^{(3)}\left( {B_{t}^{(3)} + P(\theta)B_{t}^{(3)}P(\theta)} \right)C_{t}^{(3)}} \right|\overline{0}} \right\rangle \right)} \\ {+ \frac{1}{2}\left\langle {\overline{0}\left| {A_{t}^{(4)}\left( {B_{t}^{(4)} + P(\theta)B_{t}^{(4)}P(\theta)} \right)C_{t}^{(4)}} \right|\overline{0}} \right\rangle.} \end{matrix}$

Given θ and

${\overset{\rightarrow}{x}}_{\neg j},$

we first compute the following matrices in a total of 0(L) time by standard dynamic programming technique:

-   P_(0,2t-1), P_(2t+1,4L-2t-1), P_(4L-2t+1,4L), P_(2t+1,2L-1),     P_(2L+1,4L-2t-1); -   P_(4L-2t+1,4L-2k-1) and P_(4L-2k+1,4L) for k = 0,1, ..., t - 1; -   P_(2t+1,4L-2k-1) and P_(4L-2k+1,4L-2t-1) for k = t + 1, t + 2, ...,     L — 1.

Then we compute

A_(t)^((i)),

B_(t)^((i))

and

C_(t)^((i))

for i = 1,2,3,4. After that, we calculate

C^(′)_(j)(θ;)

$\left( {\overset{\rightarrow}{x}}_{egj} \right),$

${S^{\prime}}_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)$

and

B^(′)_(j)(θ;)

Overall, this procedure takes 0(L) time.

It is to be understood that although the invention has been described above in terms of particular embodiments, the foregoing embodiments are provided as illustrative only, and do not limit or define the scope of the invention. Various other embodiments, including but not limited to the following, are also within the scope of the claims. For example, elements and components described herein may be further divided into additional components or joined together to form fewer components for performing the same functions.

Various physical embodiments of a quantum computer are suitable for use according to the present disclosure. In general, the fundamental data storage unit in quantum computing is the quantum bit, or qubit. The qubit is a quantum-computing analog of a classical digital computer system bit. A classical bit is considered to occupy, at any given point in time, one of two possible states corresponding to the binary digits (bits) 0 or 1. By contrast, a qubit is implemented in hardware by a physical medium with quantum-mechanical characteristics. Such a medium, which physically instantiates a qubit, may be referred to herein as a “physical instantiation of a qubit,” a “physical embodiment of a qubit,” a “medium embodying a qubit,” or similar terms, or simply as a “qubit,” for ease of explanation. It should be understood, therefore, that references herein to “qubits” within descriptions of embodiments of the present invention refer to physical media which embody qubits.

Each qubit has an infinite number of different potential quantum-mechanical states. When the state of a qubit is physically measured, the measurement produces one of two different basis states resolved from the state of the qubit. Thus, a single qubit can represent a one, a zero, or any quantum superposition of those two qubit states; a pair of qubits can be in any quantum superposition of 4 orthogonal basis states; and three qubits can be in any superposition of 8 orthogonal basis states. The function that defines the quantum-mechanical states of a qubit is known as its wavefunction. The wavefunction also specifies the probability distribution of outcomes for a given measurement. A qubit, which has a quantum state of dimension two (i.e., has two orthogonal basis states), may be generalized to a d-dimensional “qudit,” where d may be any integral value, such as 2, 3, 4, or higher. In the general case of a qudit, measurement of the qudit produces one of d different basis states resolved from the state of the qudit. Any reference herein to a qubit should be understood to refer more generally to a d-dimensional qudit with any value of d.

Although certain descriptions of qubits herein may describe such qubits in terms of their mathematical properties, each such qubit may be implemented in a physical medium in any of a variety of different ways. Examples of such physical media include superconducting material, trapped ions, photons, optical cavities, individual electrons trapped within quantum dots, point defects in solids (e.g., phosphorus donors in silicon or nitrogen-vacancy centers in diamond), molecules (e.g., alanine, vanadium complexes), or aggregations of any of the foregoing that exhibit qubit behavior, that is, comprising quantum states and transitions therebetween that can be controllably induced or detected.

For any given medium that implements a qubit, any of a variety of properties of that medium may be chosen to implement the qubit. For example, if electrons are chosen to implement qubits, then the x component of its spin degree of freedom may be chosen as the property of such electrons to represent the states of such qubits. Alternatively, the y component, or the z component of the spin degree of freedom may be chosen as the property of such electrons to represent the state of such qubits. This is merely a specific example of the general feature that for any physical medium that is chosen to implement qubits, there may be multiple physical degrees of freedom (e.g., the x, y, and z components in the electron spin example) that may be chosen to represent 0 and 1. For any particular degree of freedom, the physical medium may controllably be put in a state of superposition, and measurements may then be taken in the chosen degree of freedom to obtain readouts of qubit values.

Certain implementations of quantum computers, referred as gate model quantum computers, comprise quantum gates. In contrast to classical gates, there is an infinite number of possible single-qubit quantum gates that change the state vector of a qubit. Changing the state of a qubit state vector typically is referred to as a single-qubit rotation, and may also be referred to herein as a state change or a single-qubit quantum-gate operation. A rotation, state change, or single-qubit quantum-gate operation may be represented mathematically by a unitary 2X2 matrix with complex elements. A rotation corresponds to a rotation of a qubit state within its Hilbert space, which may be conceptualized as a rotation of the Bloch sphere. (As is well-known to those having ordinary skill in the art, the Bloch sphere is a geometrical representation of the space of pure states of a qubit.) Multi-qubit gates alter the quantum state of a set of qubits. For example, two-qubit gates rotate the state of two qubits as a rotation in the four-dimensional Hilbert space of the two qubits. (As is well-known to those having ordinary skill in the art, a Hilbert space is an abstract vector space possessing the structure of an inner product that allows length and angle to be measured. Furthermore, Hilbert spaces are complete: there are enough limits in the space to allow the techniques of calculus to be used.)

A quantum circuit may be specified as a sequence of quantum gates. As described in more detail below, the term “quantum gate,” as used herein, refers to the application of a gate control signal (defined below) to one or more qubits to cause those qubits to undergo certain physical transformations and thereby to implement a logical gate operation. To conceptualize a quantum circuit, the matrices corresponding to the component quantum gates may be multiplied together in the order specified by the gate sequence to produce a 2^(n)X2^(n) complex matrix representing the same overall state change on n qubits. A quantum circuit may thus be expressed as a single resultant operator. However, designing a quantum circuit in terms of constituent gates allows the design to conform to a standard set of gates, and thus enable greater ease of deployment. A quantum circuit thus corresponds to a design for actions taken upon the physical components of a quantum computer.

A given variational quantum circuit may be parameterized in a suitable device-specific manner. More generally, the quantum gates making up a quantum circuit may have an associated plurality of tuning parameters. For example, in embodiments based on optical switching, tuning parameters may correspond to the angles of individual optical elements.

In certain embodiments of quantum circuits, the quantum circuit includes both one or more gates and one or more measurement operations. Quantum computers implemented using such quantum circuits are referred to herein as implementing “measurement feedback.” For example, a quantum computer implementing measurement feedback may execute the gates in a quantum circuit and then measure only a subset (i.e., fewer than all) of the qubits in the quantum computer, and then decide which gate(s) to execute next based on the outcome(s) of the measurement(s). In particular, the measurement(s) may indicate a degree of error in the gate operation(s), and the quantum computer may decide which gate(s) to execute next based on the degree of error. The quantum computer may then execute the gate(s) indicated by the decision. This process of executing gates, measuring a subset of the qubits, and then deciding which gate(s) to execute next may be repeated any number of times. Measurement feedback may be useful for performing quantum error correction, but is not limited to use in performing quantum error correction. For every quantum circuit, there is an error-corrected implementation of the circuit with or without measurement feedback.

Some embodiments described herein generate, measure, or utilize quantum states that approximate a target quantum state (e.g., a ground state of a Hamiltonian). As will be appreciated by those trained in the art, there are many ways to quantify how well a first quantum state “approximates” a second quantum state. In the following description, any concept or definition of approximation known in the art may be used without departing from the scope hereof. For example, when the first and second quantum states are represented as first and second vectors, respectively, the first quantum state approximates the second quantum state when an inner product between the first and second vectors (called the “fidelity” between the two quantum states) is greater than a predefined amount (typically labeled ∈). In this example, the fidelity quantifies how “close” or “similar” the first and second quantum states are to each other. The fidelity represents a probability that a measurement of the first quantum state will give the same result as if the measurement were performed on the second quantum state. Proximity between quantum states can also be quantified with a distance measure, such as a Euclidean norm, a Hamming distance, or another type of norm known in the art. Proximity between quantum states can also be defined in computational terms. For example, the first quantum state approximates the second quantum state when a polynomial time-sampling of the first quantum state gives some desired information or property that it shares with the second quantum state.

Not all quantum computers are gate model quantum computers. Embodiments of the present invention are not limited to being implemented using gate model quantum computers. As an alternative example, embodiments of the present invention may be implemented, in whole or in part, using a quantum computer that is implemented using a quantum annealing architecture, which is an alternative to the gate model quantum computing architecture. More specifically, quantum annealing (QA) is a metaheuristic for finding the global minimum of a given objective function over a given set of candidate solutions (candidate states), by a process using quantum fluctuations.

FIG. 2B shows a diagram illustrating operations typically performed by a computer system 250 which implements quantum annealing. The system 250 includes both a quantum computer 252 and a classical computer 254. Operations shown on the left of the dashed vertical line 256 typically are performed by the quantum computer 252, while operations shown on the right of the dashed vertical line 256 typically are performed by the classical computer 254.

Quantum annealing starts with the classical computer 254 generating an initial Hamiltonian 260 and a final Hamiltonian 262 based on a computational problem 258 to be solved, and providing the initial Hamiltonian 260, the final Hamiltonian 262 and an annealing schedule 270 as input to the quantum computer 252. The quantum computer 252 prepares a well-known initial state 266 (FIG. 2B, operation 264), such as a quantum-mechanical superposition of all possible states (candidate states) with equal weights, based on the initial Hamiltonian 260. The classical computer 254 provides the initial Hamiltonian 260, a final Hamiltonian 262, and an annealing schedule 270 to the quantum computer 252. The quantum computer 252 starts in the initial state 266, and evolves its state according to the annealing schedule 270 following the time-dependent Schrödinger equation, a natural quantum-mechanical evolution of physical systems (FIG. 2B, operation 268). More specifically, the state of the quantum computer 252 undergoes time evolution under a time-dependent Hamiltonian, which starts from the initial Hamiltonian 260 and terminates at the final Hamiltonian 262. If the rate of change of the system Hamiltonian is slow enough, the system stays close to the ground state of the instantaneous Hamiltonian. If the rate of change of the system Hamiltonian is accelerated, the system may leave the ground state temporarily but produce a higher likelihood of concluding in the ground state of the final problem Hamiltonian, i.e., diabatic quantum computation. At the end of the time evolution, the set of qubits on the quantum annealer is in a final state 272, which is expected to be close to the ground state of the classical Ising model that corresponds to the solution to the original computational problem 258. An experimental demonstration of the success of quantum annealing for random magnets was reported immediately after the initial theoretical proposal.

The final state 272 of the quantum computer 252 is measured, thereby producing results 276 (i.e., measurements) (FIG. 2B, operation 274). The measurement operation 274 may be performed, for example, in any of the ways disclosed herein, such as in any of the ways disclosed herein in connection with the measurement unit 110 in FIG. 1 . The classical computer 254 performs postprocessing on the measurement results 276 to produce output 280 representing a solution to the original computational problem 258 (FIG. 2B, operation 278).

As yet another alternative example, embodiments of the present invention may be implemented, in whole or in part, using a quantum computer that is implemented using a one-way quantum computing architecture, also referred to as a measurement-based quantum computing architecture, which is another alternative to the gate model quantum computing architecture. More specifically, the one-way or measurement based quantum computer (MBQC) is a method of quantum computing that first prepares an entangled resource state, usually a cluster state or graph state, then performs single qubit measurements on it. It is “one-way” because the resource state is destroyed by the measurements.

The outcome of each individual measurement is random, but they are related in such a way that the computation always succeeds. In general the choices of basis for later measurements need to depend on the results of earlier measurements, and hence the measurements cannot all be performed at the same time.

Any of the functions disclosed herein may be implemented using means for performing those functions. Such means include, but are not limited to, any of the components disclosed herein, such as the computer-related components described below.

Referring to FIG. 1 , a diagram is shown of a system 100 implemented according to one embodiment of the present invention. Referring to FIG. 2A, a flowchart is shown of a method 200 performed by the system 100 of FIG. 1 according to one embodiment of the present invention. The system 100 includes a quantum computer 102. The quantum computer 102 includes a plurality of qubits 104, which may be implemented in any of the ways disclosed herein. There may be any number of qubits 104 in the quantum computer 102. For example, the qubits 104 may include or consist of no more than 2 qubits, no more than 4 qubits, no more than 8 qubits, no more than 16 qubits, no more than 32 qubits, no more than 64 qubits, no more than 128 qubits, no more than 256 qubits, no more than 512 qubits, no more than 1024 qubits, no more than 2048 qubits, no more than 4096 qubits, or no more than 8192 qubits. These are merely examples, in practice there may be any number of qubits 104 in the quantum computer 102.

There may be any number of gates in a quantum circuit. However, in some embodiments the number of gates may be at least proportional to the number of qubits 104 in the quantum computer 102. In some embodiments the gate depth may be no greater than the number of qubits 104 in the quantum computer 102, or no greater than some linear multiple of the number of qubits 104 in the quantum computer 102 (e.g., 2, 3, 4, 5, 6, or 7).

The qubits 104 may be interconnected in any graph pattern. For example, they be connected in a linear chain, a two-dimensional grid, an all-to-all connection, any combination thereof, or any subgraph of any of the preceding.

As will become clear from the description below, although element 102 is referred to herein as a “quantum computer,” this does not imply that all components of the quantum computer 102 leverage quantum phenomena. One or more components of the quantum computer 102 may, for example, be classical (i.e., non-quantum components) components which do not leverage quantum phenomena.

The quantum computer 102 includes a control unit 106, which may include any of a variety of circuitry and/or other machinery for performing the functions disclosed herein. The control unit 106 may, for example, consist entirely of classical components. The control unit 106 generates and provides as output one or more control signals 108 to the qubits 104. The control signals 108 may take any of a variety of forms, such as any kind of electromagnetic signals, such as electrical signals, magnetic signals, optical signals (e.g., laser pulses), or any combination thereof.

For example:

-   In embodiments in which some or all of the qubits 104 are     implemented as photons (also referred to as a “quantum optical”     implementation) that travel along waveguides, the control unit 106     may be a beam splitter (e.g., a heater or a mirror), the control     signals 108 may be signals that control the heater or the rotation     of the mirror, the measurement unit 110 may be a photodetector, and     the measurement signals 112 may be photons. -   In embodiments in which some or all of the qubits 104 are     implemented as charge type qubits (e.g., transmon, X-mon, G-mon) or     flux-type qubits (e.g., flux qubits, capacitively shunted flux     qubits) (also referred to as a “circuit quantum electrodynamic”     (circuit QED) implementation), the control unit 106 may be a bus     resonator activated by a drive, the control signals 108 may be     cavity modes, the measurement unit 110 may be a second resonator     (e.g., a low-Q resonator), and the measurement signals 112 may be     voltages measured from the second resonator using dispersive readout     techniques. -   In embodiments in which some or all of the qubits 104 are     implemented as superconducting circuits, the control unit 106 may be     a circuit QED-assisted control unit or a direct capacitive coupling     control unit or an inductive capacitive coupling control unit, the     control signals 108 may be cavity modes, the measurement unit 110     may be a second resonator (e.g., a low-Q resonator), and the     measurement signals 112 may be voltages measured from the second     resonator using dispersive readout techniques. -   In embodiments in which some or all of the qubits 104 are     implemented as trapped ions (e.g., electronic states of, e.g.,     magnesium ions), the control unit 106 may be a laser, the control     signals 108 may be laser pulses, the measurement unit 110 may be a     laser and either a CCD or a photodetector (e.g., a photomultiplier     tube), and the measurement signals 112 may be photons. -   In embodiments in which some or all of the qubits 104 are     implemented using nuclear magnetic resonance (NMR) (in which case     the qubits may be molecules, e.g., in liquid or solid form), the     control unit 106 may be a radio frequency (RF) antenna, the control     signals 108 may be RF fields emitted by the RF antenna, the     measurement unit 110 may be another RF antenna, and the measurement     signals 112 may be RF fields measured by the second RF antenna. -   In embodiments in which some or all of the qubits 104 are     implemented as nitrogen-vacancy centers (NV centers), the control     unit 106 may, for example, be a laser, a microwave antenna, or a     coil, the control signals 108 may be visible light, a microwave     signal, or a constant electromagnetic field, the measurement unit     110 may be a photodetector, and the measurement signals 112 may be     photons. -   In embodiments in which some or all of the qubits 104 are     implemented as two-dimensional quasiparticles called “anyons” (also     referred to as a “topological quantum computer” implementation), the     control unit 106 may be nanowires, the control signals 108 may be     local electrical fields or microwave pulses, the measurement unit     110 may be superconducting circuits, and the measurement signals 112     may be voltages. -   In embodiments in which some or all of the qubits 104 are     implemented as semiconducting material (e.g., nanowires), the     control unit 106 may be microfabricated gates, the control signals     108 may be RF or microwave signals, the measurement unit 110 may be     microfabricated gates, and the measurement signals 112 may be RF or     microwave signals.

Although not shown explicitly in FIG. 1 and not required, the measurement unit 110 may provide one or more feedback signals 114 to the control unit 106 based on the measurement signals 112. For example, quantum computers referred to as “one-way quantum computers” or “measurement-based quantum computers” utilize such feedback signal 114 from the measurement unit 110 to the control unit 106. Such feedback signal 114 is also necessary for the operation of fault-tolerant quantum computing and error correction.

The control signals 108 may, for example, include one or more state preparation signals which, when received by the qubits 104, cause some or all of the qubits 104 to change their states. Such state preparation signals constitute a quantum circuit also referred to as an “ansatz circuit.” The resulting state of the qubits 104 is referred to herein as an “initial state” or an “ansatz state.” The process of outputting the state preparation signal(s) to cause the qubits 104 to be in their initial state is referred to herein as “state preparation” (FIG. 2A, section 206). A special case of state preparation is “initialization,” also referred to as a “reset operation,” in which the initial state is one in which some or all of the qubits 104 are in the “zero” state i.e. the default single-qubit state. More generally, state preparation may involve using the state preparation signals to cause some or all of the qubits 104 to be in any distribution of desired states. In some embodiments, the control unit 106 may first perform initialization on the qubits 104 and then perform preparation on the qubits 104, by first outputting a first set of state preparation signals to initialize the qubits 104, and by then outputting a second set of state preparation signals to put the qubits 104 partially or entirely into non-zero states.

Another example of control signals 108 that may be output by the control unit 106 and received by the qubits 104 are gate control signals. The control unit 106 may output such gate control signals, thereby applying one or more gates to the qubits 104. Applying a gate to one or more qubits causes the set of qubits to undergo a physical state change which embodies a corresponding logical gate operation (e.g., single-qubit rotation, two-qubit entangling gate or multi-qubit operation) specified by the received gate control signal. As this implies, in response to receiving the gate control signals, the qubits 104 undergo physical transformations which cause the qubits 104 to change state in such a way that the states of the qubits 104, when measured (see below), represent the results of performing logical gate operations specified by the gate control signals. The term “quantum gate,” as used herein, refers to the application of a gate control signal to one or more qubits to cause those qubits to undergo the physical transformations described above and thereby to implement a logical gate operation.

It should be understood that the dividing line between state preparation (and the corresponding state preparation signals) and the application of gates (and the corresponding gate control signals) may be chosen arbitrarily. For example, some or all the components and operations that are illustrated in FIGS. 1 and 2A-2B as elements of “state preparation” may instead be characterized as elements of gate application. Conversely, for example, some or all of the components and operations that are illustrated in FIGS. 1 and 2A-2B as elements of “gate application” may instead be characterized as elements of state preparation. As one particular example, the system and method of FIGS. 1 and 2A-2B may be characterized as solely performing state preparation followed by measurement, without any gate application, where the elements that are described herein as being part of gate application are instead considered to be part of state preparation. Conversely, for example, the system and method of FIGS. 1 and 2A-2B may be characterized as solely performing gate application followed by measurement, without any state preparation, and where the elements that are described herein as being part of state preparation are instead considered to be part of gate application.

The quantum computer 102 also includes a measurement unit 110, which performs one or more measurement operations on the qubits 104 to read out measurement signals 112 (also referred to herein as “measurement results”) from the qubits 104, where the measurement results 112 are signals representing the states of some or all of the qubits 104. In practice, the control unit 106 and the measurement unit 110 may be entirely distinct from each other, or contain some components in common with each other, or be implemented using a single unit (i.e., a single unit may implement both the control unit 106 and the measurement unit 110). For example, a laser unit may be used both to generate the control signals 108 and to provide stimulus (e.g., one or more laser beams) to the qubits 104 to cause the measurement signals 112 to be generated.

In general, the quantum computer 102 may perform various operations described above any number of times. For example, the control unit 106 may generate one or more control signals 108, thereby causing the qubits 104 to perform one or more quantum gate operations. The measurement unit 110 may then perform one or more measurement operations on the qubits 104 to read out a set of one or more measurement signals 112. The measurement unit 110 may repeat such measurement operations on the qubits 104 before the control unit 106 generates additional control signals 108, thereby causing the measurement unit 110 to read out additional measurement signals 112 resulting from the same gate operations that were performed before reading out the previous measurement signals 112. The measurement unit 110 may repeat this process any number of times to generate any number of measurement signals 112 corresponding to the same gate operations. The quantum computer 102 may then aggregate such multiple measurements of the same gate operations in any of a variety of ways.

After the measurement unit 110 has performed one or more measurement operations on the qubits 104 after they have performed one set of gate operations, the control unit 106 may generate one or more additional control signals 108, which may differ from the previous control signals 108, thereby causing the qubits 104 to perform one or more additional quantum gate operations, which may differ from the previous set of quantum gate operations. The process described above may then be repeated, with the measurement unit 110 performing one or more measurement operations on the qubits 104 in their new states (resulting from the most recently-performed gate operations).

In general, the system 100 may implement a plurality of quantum circuits as follows. For each quantum circuit C in the plurality of quantum circuits (FIG. 2A, operation 202), the system 100 performs a plurality of “shots” on the qubits 104. The meaning of a shot will become clear from the description that follows. For each shot S in the plurality of shots (FIG. 2A, operation 204), the system 100 prepares the state of the qubits 104 (FIG. 2A, section 206). More specifically, for each quantum gate G in quantum circuit C (FIG. 2A, operation 210), the system 100 applies quantum gate G to the qubits 104 (FIG. 2A, operations 212 and 214).

Then, for each of the qubits Q 104 (FIG. 2A, operation 216), the system 100 measures the qubit Q to produce measurement output representing a current state of qubit Q (FIG. 2A, operations 218 and 220).

The operations described above are repeated for each shot S (FIG. 2A, operation 222), and circuit C (FIG. 2A, operation 224). As the description above implies, a single “shot” involves preparing the state of the qubits 104 and applying all of the quantum gates in a circuit to the qubits 104 and then measuring the states of the qubits 104; and the system 100 may perform multiple shots for one or more circuits.

Referring to FIG. 3 , a diagram is shown of a hybrid quantum-classical (HQC) computer 300 implemented according to one embodiment of the present invention. The HQC 300 includes a quantum computer component 102 (which may, for example, be implemented in the manner shown and described in connection with FIG. 1 ) and a classical computer component 306. The classical computer component may be a machine implemented according to the general computing model established by John Von Neumann, in which programs are written in the form of ordered lists of instructions and stored within a classical (e.g., digital) memory 310 and executed by a classical (e.g., digital) processor 308 of the classical computer. The memory 310 is classical in the sense that it stores data in a storage medium in the form of bits, which have a single definite binary state at any point in time. The bits stored in the memory 310 may, for example, represent a computer program. The classical computer component 304 typically includes a bus 314. The processor 308 may read bits from and write bits to the memory 310 over the bus 314. For example, the processor 308 may read instructions from the computer program in the memory 310, and may optionally receive input data 316 from a source external to the computer 302, such as from a user input device such as a mouse, keyboard, or any other input device. The processor 308 may use instructions that have been read from the memory 310 to perform computations on data read from the memory 310 and/or the input 316, and generate output from those instructions. The processor 308 may store that output back into the memory 310 and/or provide the output externally as output data 318 via an output device, such as a monitor, speaker, or network device.

The quantum computer component 102 may include a plurality of qubits 104, as described above in connection with FIG. 1 . A single qubit may represent a one, a zero, or any quantum superposition of those two qubit states. The classical computer component 304 may provide classical state preparation signals Y32 to the quantum computer 102, in response to which the quantum computer 102 may prepare the states of the qubits 104 in any of the ways disclosed herein, such as in any of the ways disclosed in connection with FIGS. 1 and 2A-2B.

Once the qubits 104 have been prepared, the classical processor 308 may provide classical control signals Y34 to the quantum computer 102, in response to which the quantum computer 102 may apply the gate operations specified by the control signals Y32 to the qubits 104, as a result of which the qubits 104 arrive at a final state. The measurement unit 110 in the quantum computer 102 (which may be implemented as described above in connection with FIGS. 1 and 2A-2B) may measure the states of the qubits 104 and produce measurement output Y38 representing the collapse of the states of the qubits 104 into one of their eigenstates. As a result, the measurement output Y38 includes or consists of bits and therefore represents a classical state. The quantum computer 102 provides the measurement output Y38 to the classical processor 308. The classical processor 308 may store data representing the measurement output Y38 and/or data derived therefrom in the classical memory 310.

The steps described above may be repeated any number of times, with what is described above as the final state of the qubits 104 serving as the initial state of the next iteration. In this way, the classical computer 304 and the quantum computer 102 may cooperate as co-processors to perform joint computations as a single computer system.

Although certain functions may be described herein as being performed by a classical computer and other functions may be described herein as being performed by a quantum computer, these are merely examples and do not constitute limitations of the present invention. A subset of the functions which are disclosed herein as being performed by a quantum computer may instead be performed by a classical computer. For example, a classical computer may execute functionality for emulating a quantum computer and provide a subset of the functionality described herein, albeit with functionality limited by the exponential scaling of the simulation. Functions which are disclosed herein as being performed by a classical computer may instead be performed by a quantum computer.

The techniques described above may be implemented, for example, in hardware, in one or more computer programs tangibly stored on one or more computer-readable media, firmware, or any combination thereof, such as solely on a quantum computer, solely on a classical computer, or on a hybrid quantum-classical (HQC) computer. The techniques disclosed herein may, for example, be implemented solely on a classical computer, in which the classical computer emulates the quantum computer functions disclosed herein.

The techniques described above may be implemented in one or more computer programs executing on (or executable by) a programmable computer (such as a classical computer, a quantum computer, or an HQC) including any combination of any number of the following: a processor, a storage medium readable and/or writable by the processor (including, for example, volatile and non-volatile memory and/or storage elements), an input device, and an output device. Program code may be applied to input entered using the input device to perform the functions described and to generate output using the output device.

Embodiments of the present invention include features which are only possible and/or feasible to implement with the use of one or more computers, computer processors, and/or other elements of a computer system. Such features are either impossible or impractical to implement mentally and/or manually. For example, it would be impossible to mentally or manually generate random samples from a complex distribution describing a operator P and state s>.

Any claims herein which affirmatively require a computer, a processor, a memory, or similar computer-related elements, are intended to require such elements, and should not be interpreted as if such elements are not present in or required by such claims. Such claims are not intended, and should not be interpreted, to cover methods and/or systems which lack the recited computer-related elements. For example, any method claim herein which recites that the claimed method is performed by a computer, a processor, a memory, and/or similar computer-related element, is intended to, and should only be interpreted to, encompass methods which are performed by the recited computer-related element(s). Such a method claim should not be interpreted, for example, to encompass a method that is performed mentally or by hand (e.g., using pencil and paper). Similarly, any product claim herein which recites that the claimed product includes a computer, a processor, a memory, and/or similar computer-related element, is intended to, and should only be interpreted to, encompass products which include the recited computer-related element(s). Such a product claim should not be interpreted, for example, to encompass a product that does not include the recited computer-related element(s).

In embodiments in which a classical computing component executes a computer program providing any subset of the functionality within the scope of the claims below, the computer program may be implemented in any programming language, such as assembly language, machine language, a high-level procedural programming language, or an object-oriented programming language. The programming language may, for example, be a compiled or interpreted programming language.

Each such computer program may be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a computer processor, which may be either a classical processor or a quantum processor. Method steps of the invention may be performed by one or more computer processors executing a program tangibly embodied on a computer-readable medium to perform functions of the invention by operating on input and generating output. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, the processor receives (reads) instructions and data from a memory (such as a read-only memory and/or a random access memory) and writes (stores) instructions and data to the memory. Storage devices suitable for tangibly embodying computer program instructions and data include, for example, all forms of non-volatile memory, such as semiconductor memory devices, including EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROMs. Any of the foregoing may be supplemented by, or incorporated in, specially-designed ASICs (application-specific integrated circuits) or FPGAs (Field-Programmable Gate Arrays). A classical computer can generally also receive (read) programs and data from, and write (store) programs and data to, a non-transitory computer-readable storage medium such as an internal disk (not shown) or a removable disk. These elements will also be found in a conventional desktop or workstation computer as well as other computers suitable for executing computer programs implementing the methods described herein, which may be used in conjunction with any digital print engine or marking engine, display monitor, or other raster output device capable of producing color or gray scale pixels on paper, film, display screen, or other output medium.

Any data disclosed herein may be implemented, for example, in one or more data structures tangibly stored on a non-transitory computer-readable medium (such as a classical computer-readable medium, a quantum computer-readable medium, or an HQC computer-readable medium). Embodiments of the invention may store such data in such data structure(s) and read such data from such data structure(s). 

1. A method for improved quantum amplitude estimation for reducing a number of measurements to generate a statistic accurately, comprising: selecting, with a classical computer, a plurality of quantum-circuit-parameter values to optimize an accuracy-improvement rate of the statistic estimating an expectation value (s|P|s) of an observable p with respect to a quantum state |s); applying, to one or more qubits of a quantum computer, a sequence of alternating first and second generalized reflection operators to transform the one or more qubits from the quantum state |s) into a reflected quantum state, each of the first and second generalized reflection operators being controlled according to a corresponding one of the plurality of quantum-circuit-parameter values; measuring the plurality of qubits in the reflected quantum state with respect to the observable p to obtain a set of measurement outcomes; and updating, on the classical computer, the statistic with the set of measurement outcomes. 